MonnaSimulation¶
MoNNA decentralised simulation.
- Reference:
Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê Nguyên Hoang, Rafael Pinot, and John Stephan. “Robust Collaborative Learning with Linear Gradient Overhead.” In Proceedings of the 40th International Conference on Machine Learning (ICML 2023).
- class krum.simulations.decentralised.monna_icml_2023.MonnaSimulation(*, model: Model, data: Sequence[Iterable[tuple[Tensor, Tensor]]], loss_fn: Callable[[Tensor, Tensor], Tensor], n: int, f: int, learning_rate: float, beta: float = 0.99, attack: type[Attack] | None = None, attack_kwargs: dict[str, Any] | None = None, aggregator: type[Aggregator] | None = None, aggregator_kwargs: dict[str, Any] | None = None, byzantine_reach: Literal['all', 'sampled'] = 'all', seed: int | None = None)[source]¶
Bases:
DecentralisedSimulation[MonnaStepResult]MoNNA simulation runner.
Each round, every honest worker runs one local momentum-SGD step and then replaces its model with a nearest-neighbor average over the
n - 2fmodels closest to its own, drawn from then - fmodels it received that round (its own plus a set of responders).MoNNA owns the local optimisation rule (momentum-SGD) and its state, so the momentum lives here rather than in
DecentralisedSimulation.byzantine_reachselects the adversary model used when forming those received sets ingather_received_models():"all"is the worst case — every Byzantine model reaches every worker, and only the honest responders are randomized; the robustness measured is not inflated by an adversary that randomly misses some workers."sampled"draws responders uniformly from all other nodes, so a worker may receive anywhere from0tofByzantine models, modelling gossip where Byzantine reach is itself random.
Both modes keep the received-set size at
n - f; only the Byzantine composition differs.- build_step_result(*, honest_gradients: Tensor, local_parameters: Tensor, byzantine_parameters: Tensor, mixed_parameters: Tensor, losses: Tensor) MonnaStepResult[source]¶
Build the MoNNA snapshot, including the committed momentum.
- Parameters:
honest_gradients – Stacked honest gradients this round.
local_parameters – Post-local-update honest models.
byzantine_parameters – Byzantine models injected this round.
mixed_parameters – Mixed models (equal to the committed parameters).
losses – Per-worker losses.
- Returns:
A snapshot dict with the step index and a detached clone of each
tensor produced this step.
- compute_local_parameter_updates(momentum: Tensor) Tensor[source]¶
Compute
theta_{t+1/2}before the model-mixing phase.- Parameters:
momentum – The next momentum, one row per honest worker.
- Returns:
The post-local-update parameters, one row per honest worker.
- gather_received_models(honest_vectors: Tensor, byzantine_parameters: Tensor, *, worker_index: int) Tensor[source]¶
Build the
n - fset of models received by one honest worker.The worker’s own model leads the set so a pivot-anchored aggregator can rely on its position; the remaining
n - f - 1models are placed according tobyzantine_reach.- Parameters:
honest_vectors – Post-local-update honest models, one row per worker.
byzantine_parameters – Byzantine models, shape
(f, d).worker_index – Index of the receiving honest worker.
- Returns:
The ``n - f`` received models, with the worker’s own model first.
- local_update(gradients: Tensor) Tensor[source]¶
Run MoNNA’s momentum-SGD local step and commit the new momentum.
- Parameters:
gradients – Stacked honest gradients, one row per worker.
- Returns:
The post-local-update parameters ``theta_{t+1/2}``, one row per
honest worker.
- select_honest_responder_indices(*, worker_index: int, device: device) Tensor[source]¶
Randomly select the
n - 2f - 1other honest workers that respond to one worker.Used by the
"all"reach mode, where thefByzantine models are always included, so the honest responders fill the remaining slots.- Parameters:
worker_index – Index of the receiving honest worker, excluded from the selection.
device – Device on which to build the index tensors.
- Returns:
The selected honest responder indices, shape `` (n - 2f - 1,)
- select_received_model_indices(*, worker_index: int, device: device) Tensor[source]¶
Randomly select the
n - f - 1nodes received by one honest worker.Used by the
"sampled"reach mode, where responders are drawn uniformly from every other node, honest or Byzantine.- Parameters:
worker_index – Index of the receiving honest worker, excluded from the selection.
device – Device on which to build the index tensors.
- Returns:
The selected node indices, shape `` (n - f - 1,)
- class krum.simulations.decentralised.monna_icml_2023.MonnaStepResult[source]¶
Bases:
StepResultMoNNA snapshot, base fields plus per-worker momentum.
See also
For the base class, see DecentralisedSimulation.