Datasets

LIBSVM dataset loaders.

This module provides builder functions that can be registered automatically by the experiments.Dataset loader because they are listed in __all__. Each builder downloads the raw LIBSVM file on first use, caches a pre-processed PyTorch tensor version, and returns an infinite-batch generator.

Example:

>>> from experiments import Dataset
>>> dataset = Dataset("svm-phishing", train=True, download=True)
>>> inputs, labels = dataset.sample()

See Also:

experiments.batch_datasethelper used internally to create the infinite

sampler from raw tensors.

experiments.datasets.svm.phishing(train=True, batch_size=None, root=None, download=False, *args, **kwargs)[source]

Phishing dataset builder returning an infinite-batch generator.

Parameters:
  • train (bool, optional) – Whether to return the training split. If False, the test split is returned instead.

  • batch_size (int or None, optional) – Number of samples per batch. None or 0 yields the full split in a single batch.

  • root (pathlib.Path or str or None, optional) – Cache directory. None defaults to experiments.dataset.Dataset.get_default_root().

  • download (bool, optional) – Whether to allow downloading the raw file if the cache is missing.

  • *args (object) – Ignored (kept for API compatibility).

  • **kwargs (object) – Ignored (kept for API compatibility).

  • Returns

  • -------

  • generator – Infinite sampler yielding (inputs, labels) tuples.

  • Notes

  • -----

  • test). (The dataset is split at position 8400 (≈ 76 % train / 24 %)

  • divisibility (The split point was chosen for good)

:param (\(8400 = 2^4 \times 3 \times 5^2 \times 7\)).:

See also

For the dataset wrapper that loads these constructors, see Dataset.