Dataset¶
Dataset loading, batching, and sampling utilities.
This module wraps torchvision.datasets and custom dataset modules into
uniform infinite-batch generators. It also provides helpers for train/test
splitting and raw-tensor batching.
Example:¶
>>> from experiments import Dataset, make_datasets
>>> trainset, testset = make_datasets("cifar10", train_batch=128)
>>> inputs, targets = trainset.sample(config)
- class experiments.dataset.Dataset(data, name=None, root=None, *args, **kwargs)¶
Bases:
objectUnified dataset wrapper producing infinite batches.
This class can wrap:
A
torchvisiondataset loaded by name.A custom generator yielding batches forever.
A single fixed batch repeated forever.
- Parameters:
data (str, generator, or object) – Dataset name, infinite generator, or single batch.
name (str or None, optional) – User-defined name for debugging.
root (str or pathlib.Path or None, optional) – Cache root directory.
Noneuses the default.*args (object) – Forwarded to the dataset constructor when
datais a string.**kwargs (object) – Forwarded to the dataset constructor when
datais a string.Raises
------
tools.UnavailableException – If
datais an unknown dataset name.TypeError – If constructor arguments are invalid.
- epoch(config=None)¶
Return a finite epoch iterator.
Note
Only works for DataLoader-based datasets.
- Parameters:
config (experiments.Configuration or None, optional) – Target configuration for tensor placement.
Returns
-------
generator – Finite iterator over one epoch.
- classmethod get_default_root()¶
Lazily initialize and return the default dataset cache directory.
Returns:¶
- pathlib.Path
Path to the dataset cache. Falls back to the system temp directory if the default does not exist.
- sample(config=None)¶
Sample the next batch.
- Parameters:
config (experiments.Configuration or None, optional) – Target configuration for tensor placement.
Returns
-------
tuple – Next batch, optionally moved to the target device.
- experiments.dataset.batch_dataset(inputs, labels, train=False, batch_size=None, split=0.75)¶
Batch a raw tensor dataset into infinite sampler generators.
- Parameters:
inputs (torch.Tensor) – Input data tensor.
labels (torch.Tensor) – Label tensor with the same first-dimension size as
inputs.train (bool, optional) – Whether to build a training set (adds shuffling) or a test set.
batch_size (int or None, optional) – Batch size.
Noneor0uses the full split size.split (float or int, optional) – Fraction of samples for training when
< 1, or absolute count when>= 1.Returns
-------
generator – Infinite sampler generator.
- experiments.dataset.get_default_transform(dataset, train)¶
Return the default transform for a torchvision dataset.
- experiments.dataset.make_datasets(dataset, train_batch=None, test_batch=None, train_transforms=None, test_transforms=None, num_workers=1, **custom_args)¶
Build training and testing dataset wrappers.
- Parameters:
dataset (str) – Case-sensitive dataset name.
train_batch (int or None, optional) – Training batch size.
Noneor0for full-batch.test_batch (int or None, optional) – Testing batch size.
Noneor0for full-batch.train_transforms (callable or None, optional) – Transform for the training set.
Noneuses the default.test_transforms (callable or None, optional) – Transform for the testing set.
Noneuses the default.num_workers (int or tuple[int, int], optional) – Number of workers for the training and testing loaders. An
intapplies to both; a tuple specifies(train_workers, test_workers).**custom_args (object) – Additional keyword arguments forwarded to the dataset constructor.
Returns
-------
tuple[Dataset – Training and testing dataset wrappers.
Dataset] – Training and testing dataset wrappers.
- experiments.dataset.make_sampler(loader)¶
Create an infinite sampler from a DataLoader.
- Parameters:
loader (torch.utils.data.DataLoader) – Finite data loader.
Yields
------
tuple – Batches, transparently restarting the loader when exhausted.
See also
For the model that consumes these datasets, see Model. For the configuration that controls tensor placement, see Configuration.