Dataset

Dataset loading, batching, and sampling utilities.

This module wraps torchvision.datasets and custom dataset modules into uniform infinite-batch generators. It also provides helpers for train/test splitting and raw-tensor batching.

Example:

>>> from experiments import Dataset, make_datasets
>>> trainset, testset = make_datasets("cifar10", train_batch=128)
>>> inputs, targets = trainset.sample(config)
class experiments.dataset.Dataset(data, name=None, root=None, *args, **kwargs)

Bases: object

Unified dataset wrapper producing infinite batches.

This class can wrap:

  • A torchvision dataset loaded by name.

  • A custom generator yielding batches forever.

  • A single fixed batch repeated forever.

Parameters:
  • data (str, generator, or object) – Dataset name, infinite generator, or single batch.

  • name (str or None, optional) – User-defined name for debugging.

  • root (str or pathlib.Path or None, optional) – Cache root directory. None uses the default.

  • *args (object) – Forwarded to the dataset constructor when data is a string.

  • **kwargs (object) – Forwarded to the dataset constructor when data is a string.

  • Raises

  • ------

  • tools.UnavailableException – If data is an unknown dataset name.

  • TypeError – If constructor arguments are invalid.

epoch(config=None)

Return a finite epoch iterator.

Note

Only works for DataLoader-based datasets.

Parameters:
  • config (experiments.Configuration or None, optional) – Target configuration for tensor placement.

  • Returns

  • -------

  • generator – Finite iterator over one epoch.

classmethod get_default_root()

Lazily initialize and return the default dataset cache directory.

Returns:

pathlib.Path

Path to the dataset cache. Falls back to the system temp directory if the default does not exist.

sample(config=None)

Sample the next batch.

Parameters:
  • config (experiments.Configuration or None, optional) – Target configuration for tensor placement.

  • Returns

  • -------

  • tuple – Next batch, optionally moved to the target device.

experiments.dataset.batch_dataset(inputs, labels, train=False, batch_size=None, split=0.75)

Batch a raw tensor dataset into infinite sampler generators.

Parameters:
  • inputs (torch.Tensor) – Input data tensor.

  • labels (torch.Tensor) – Label tensor with the same first-dimension size as inputs.

  • train (bool, optional) – Whether to build a training set (adds shuffling) or a test set.

  • batch_size (int or None, optional) – Batch size. None or 0 uses the full split size.

  • split (float or int, optional) – Fraction of samples for training when < 1, or absolute count when >= 1.

  • Returns

  • -------

  • generator – Infinite sampler generator.

experiments.dataset.get_default_transform(dataset, train)

Return the default transform for a torchvision dataset.

Parameters:
  • dataset (str or None) – Case-sensitive dataset name. None returns None.

  • train (bool) – Whether to return the training transform. Ignored when dataset is None.

  • Returns

  • -------

  • None (torchvision.transforms.Compose or) – Composed transform, or None if the dataset is unknown.

experiments.dataset.make_datasets(dataset, train_batch=None, test_batch=None, train_transforms=None, test_transforms=None, num_workers=1, **custom_args)

Build training and testing dataset wrappers.

Parameters:
  • dataset (str) – Case-sensitive dataset name.

  • train_batch (int or None, optional) – Training batch size. None or 0 for full-batch.

  • test_batch (int or None, optional) – Testing batch size. None or 0 for full-batch.

  • train_transforms (callable or None, optional) – Transform for the training set. None uses the default.

  • test_transforms (callable or None, optional) – Transform for the testing set. None uses the default.

  • num_workers (int or tuple[int, int], optional) – Number of workers for the training and testing loaders. An int applies to both; a tuple specifies (train_workers, test_workers).

  • **custom_args (object) – Additional keyword arguments forwarded to the dataset constructor.

  • Returns

  • -------

  • tuple[Dataset – Training and testing dataset wrappers.

  • Dataset] – Training and testing dataset wrappers.

experiments.dataset.make_sampler(loader)

Create an infinite sampler from a DataLoader.

Parameters:
  • loader (torch.utils.data.DataLoader) – Finite data loader.

  • Yields

  • ------

  • tuple – Batches, transparently restarting the loader when exhausted.

See also

For the model that consumes these datasets, see Model. For the configuration that controls tensor placement, see Configuration.