Datasets¶
LIBSVM dataset loaders.
This module provides builder functions that can be registered automatically by
the experiments.Dataset loader because they are listed in __all__.
Each builder downloads the raw LIBSVM file on first use, caches a pre-processed
PyTorch tensor version, and returns an infinite-batch generator.
Example:¶
>>> from experiments import Dataset
>>> dataset = Dataset("svm-phishing", train=True, download=True)
>>> inputs, labels = dataset.sample()
See Also:¶
- experiments.batch_datasethelper used internally to create the infinite
sampler from raw tensors.
- experiments.datasets.svm.phishing(train=True, batch_size=None, root=None, download=False, *args, **kwargs)[source]¶
Phishing dataset builder returning an infinite-batch generator.
- Parameters:
train (bool, optional) – Whether to return the training split. If
False, the test split is returned instead.batch_size (int or None, optional) – Number of samples per batch.
Noneor0yields the full split in a single batch.root (pathlib.Path or str or None, optional) – Cache directory.
Nonedefaults toexperiments.dataset.Dataset.get_default_root().download (bool, optional) – Whether to allow downloading the raw file if the cache is missing.
*args (object) – Ignored (kept for API compatibility).
**kwargs (object) – Ignored (kept for API compatibility).
Returns
-------
generator – Infinite sampler yielding
(inputs, labels)tuples.Notes
-----
test). (The dataset is split at position 8400 (≈ 76 % train / 24 %)
divisibility (The split point was chosen for good)
:param (\(8400 = 2^4 \times 3 \times 5^2 \times 7\)).:
See also
For the dataset wrapper that loads these constructors, see Dataset.