Jobs Utilities¶
Experiment job management helpers.
This module provides utilities for running and managing experiment jobs in a reproducible manner.
Classes and Functions¶
- Job orchestration
Commandencapsulates a command with seed, device, and result-directory arguments.Jobsmanages parallel execution of experiments on multiple devices.- Helpers
dict_to_cmdlistconverts dictionaries into command-line argument lists.move_directorymoves an existing directory aside with versioning.
Example:¶
from tools import Command, Jobs, dict_to_cmdlist
cmd = Command(["python", "train.py", "--lr", "0.01"])
jobs = Jobs("./results", devices=["cuda:0", "cuda:1"])
jobs.submit("exp1", cmd)
jobs.wait()
jobs.close()
- class tools.jobs.Command(command: Iterable[str])[source]¶
Bases:
objectCommand wrapper that adds standard runtime arguments.
This class wraps a base command and automatically appends seed, device, and result-directory arguments when building the final command line.
- Parameters:
- build(seed: int | str, device: str, resdir: Path | str) list[str][source]¶
Build the final command line.
- Parameters:
device (str) – Device on which to run the experiment (e.g.
"cuda:0").resdir (pathlib.Path or str) – Target directory path for results.
Returns
-------
str (list of) – Final command list ready to be passed to
subprocess.run.
- class tools.jobs.Jobs(res_dir: Path | str, devices: list[str] | None = None, devmult: int = 1, seeds: Sequence[int] | None = None)[source]¶
Bases:
objectJob execution manager for parallel experiments.
Manages parallel execution of experiments across multiple devices, with support for result tracking and error handling.
- Parameters:
res_dir (pathlib.Path or str) – Directory to store results.
devices (list of str, optional) – List of device names (e.g.
["cuda:0", "cuda:1"]). Defaults to["cpu"]if none specified.devmult (int, optional) – Number of parallel jobs per device. Default is
1.seeds (sequence of int, optional) – Seeds to use for repeating experiments. Default is
range(1, 6).Attributes
----------
_res_dir (pathlib.Path) – Resolved result directory.
_jobs (list of tuple or None) – Pending job queue as
(name, seed, command)tuples, orNonewhen the manager has been closed._workers (list of threading.Thread) – Worker thread pool, one entry per active slot.
_seeds (tuple of int) – Seeds used for repeating experiments.
_lock (threading.Lock) – Main lock protecting shared state.
_cvready (threading.Condition) – Condition variable to signal that new jobs are available or that workers must shut down.
_cvdone (threading.Condition) – Condition variable to signal that all submitted jobs have been processed.
- get_seeds() tuple[int, ...][source]¶
Get the list of seeds used for repeating the experiments.
Returns:¶
- tuple of int
Seeds used by this manager.
- tools.jobs.dict_to_cmdlist(dp: dict[str, Any]) list[str][source]¶
Convert a dictionary into command-line arguments.
This helper is useful for turning experiment configurations into CLI arguments.
- Parameters:
dp (dict of str to Any) – Dictionary mapping parameter names to values.
Returns
-------
str (list of) – Command-line arguments such as
["--lr", "0.01", "--batch", "32"].Notes
-----
True. (- Boolean values are included only when they are)
pairs. (- Lists and tuples expand to repeated --name value)
Example
-------
dict_to_cmdlist({"lr" (>>>)
['--lr'
'0.01'
'--batch'
'32'
'--debug']
dict_to_cmdlist({"layers" (>>>)
['--layers'
'64'
'--layers'
'128']
See also
For general utilities (parsing, timing, registries), see Miscellaneous Utilities. For PyTorch tensor helpers, see PyTorch Utilities.