Dataset¶

Dataset loading, batching, and sampling utilities.

This module wraps torchvision.datasets and custom dataset modules into uniform infinite-batch generators. It also provides helpers for train/test splitting and raw-tensor batching.

Example:¶

>>> from experiments import Dataset, make_datasets
>>> trainset, testset = make_datasets("cifar10", train_batch=128)
>>> inputs, targets = trainset.sample(config)

class experiments.dataset.Dataset(data, name=None, root=None, *args, **kwargs)¶

Bases: object

Unified dataset wrapper producing infinite batches.

This class can wrap:

A torchvision dataset loaded by name.
A custom generator yielding batches forever.
A single fixed batch repeated forever.

Parameters:

data (str, generator, or object) – Dataset name, infinite generator, or single batch.
name (str or None, optional) – User-defined name for debugging.
root (str or pathlib.Path or None, optional) – Cache root directory. None uses the default.
*args (object) – Forwarded to the dataset constructor when data is a string.
**kwargs (object) – Forwarded to the dataset constructor when data is a string.
Raises
------
tools.UnavailableException – If data is an unknown dataset name.
TypeError – If constructor arguments are invalid.

epoch(config=None)¶

Return a finite epoch iterator.

Note

Only works for DataLoader-based datasets.

Parameters:

config (experiments.Configuration or None, optional) – Target configuration for tensor placement.
Returns
-------
generator – Finite iterator over one epoch.

classmethod get_default_root()¶

Lazily initialize and return the default dataset cache directory.

Returns:¶

pathlib.Path: Path to the dataset cache. Falls back to the system temp directory if the default does not exist.

sample(config=None)¶

Sample the next batch.

Parameters:

config (experiments.Configuration or None, optional) – Target configuration for tensor placement.
Returns
-------
tuple – Next batch, optionally moved to the target device.

experiments.dataset.batch_dataset(inputs, labels, train=False, batch_size=None, split=0.75)¶

Batch a raw tensor dataset into infinite sampler generators.

Parameters:

inputs (torch.Tensor) – Input data tensor.
labels (torch.Tensor) – Label tensor with the same first-dimension size as inputs.
train (bool, optional) – Whether to build a training set (adds shuffling) or a test set.
batch_size (int or None, optional) – Batch size. None or 0 uses the full split size.
split (float or int, optional) – Fraction of samples for training when < 1, or absolute count when >= 1.
Returns
-------
generator – Infinite sampler generator.

experiments.dataset.get_default_transform(dataset, train)¶

Return the default transform for a torchvision dataset.

Parameters:

dataset (str or None) – Case-sensitive dataset name. None returns None.
train (bool) – Whether to return the training transform. Ignored when dataset is None.
Returns
-------
None (torchvision.transforms.Compose or) – Composed transform, or None if the dataset is unknown.

experiments.dataset.make_datasets(dataset, train_batch=None, test_batch=None, train_transforms=None, test_transforms=None, num_workers=1, **custom_args)¶

Build training and testing dataset wrappers.

Parameters:

dataset (str) – Case-sensitive dataset name.
train_batch (int or None, optional) – Training batch size. None or 0 for full-batch.
test_batch (int or None, optional) – Testing batch size. None or 0 for full-batch.
train_transforms (callable or None, optional) – Transform for the training set. None uses the default.
test_transforms (callable or None, optional) – Transform for the testing set. None uses the default.
num_workers (int or tuple[int, int], optional) – Number of workers for the training and testing loaders. An int applies to both; a tuple specifies (train_workers, test_workers).
**custom_args (object) – Additional keyword arguments forwarded to the dataset constructor.
Returns
-------
tuple[Dataset – Training and testing dataset wrappers.
Dataset] – Training and testing dataset wrappers.

experiments.dataset.make_sampler(loader)¶

Create an infinite sampler from a DataLoader.

Parameters:

loader (torch.utils.data.DataLoader) – Finite data loader.
Yields
------
tuple – Batches, transparently restarting the loader when exhausted.