MonnaSimulation

MoNNA decentralised simulation.

Reference:

Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Lê Nguyên Hoang, Rafael Pinot, and John Stephan. “Robust Collaborative Learning with Linear Gradient Overhead.” In Proceedings of the 40th International Conference on Machine Learning (ICML 2023).

class krum.simulations.decentralised.monna_icml_2023.MonnaSimulation(*, model: Model, data: Sequence[Iterable[tuple[Tensor, Tensor]]], loss_fn: Callable[[Tensor, Tensor], Tensor], n: int, f: int, learning_rate: float, beta: float = 0.99, attack: type[Attack] | None = None, attack_kwargs: dict[str, Any] | None = None, aggregator: type[Aggregator] | None = None, aggregator_kwargs: dict[str, Any] | None = None, byzantine_reach: Literal['all', 'sampled'] = 'all', seed: int | None = None)[source]

Bases: DecentralisedSimulation[MonnaStepResult]

MoNNA simulation runner.

Each round, every honest worker runs one local momentum-SGD step and then replaces its model with a nearest-neighbor average over the n - 2f models closest to its own, drawn from the n - f models it received that round (its own plus a set of responders).

MoNNA owns the local optimisation rule (momentum-SGD) and its state, so the momentum lives here rather than in DecentralisedSimulation.

byzantine_reach selects the adversary model used when forming those received sets in gather_received_models():

  • "all" is the worst case — every Byzantine model reaches every worker, and only the honest responders are randomized; the robustness measured is not inflated by an adversary that randomly misses some workers.

  • "sampled" draws responders uniformly from all other nodes, so a worker may receive anywhere from 0 to f Byzantine models, modelling gossip where Byzantine reach is itself random.

Both modes keep the received-set size at n - f; only the Byzantine composition differs.

build_step_result(*, honest_gradients: Tensor, local_parameters: Tensor, byzantine_parameters: Tensor, mixed_parameters: Tensor, losses: Tensor) MonnaStepResult[source]

Build the MoNNA snapshot, including the committed momentum.

Parameters:
  • honest_gradients – Stacked honest gradients this round.

  • local_parameters – Post-local-update honest models.

  • byzantine_parameters – Byzantine models injected this round.

  • mixed_parameters – Mixed models (equal to the committed parameters).

  • losses – Per-worker losses.

Returns:
  • A snapshot dict with the step index and a detached clone of each

  • tensor produced this step.

compute_local_parameter_updates(momentum: Tensor) Tensor[source]

Compute theta_{t+1/2} before the model-mixing phase.

Parameters:

momentum – The next momentum, one row per honest worker.

Returns:

The post-local-update parameters, one row per honest worker.

gather_received_models(honest_vectors: Tensor, byzantine_parameters: Tensor, *, worker_index: int) Tensor[source]

Build the n - f set of models received by one honest worker.

The worker’s own model leads the set so a pivot-anchored aggregator can rely on its position; the remaining n - f - 1 models are placed according to byzantine_reach.

Parameters:
  • honest_vectors – Post-local-update honest models, one row per worker.

  • byzantine_parameters – Byzantine models, shape (f, d).

  • worker_index – Index of the receiving honest worker.

Returns:

The ``n - f`` received models, with the worker’s own model first.

local_update(gradients: Tensor) Tensor[source]

Run MoNNA’s momentum-SGD local step and commit the new momentum.

Parameters:

gradients – Stacked honest gradients, one row per worker.

Returns:
  • The post-local-update parameters ``theta_{t+1/2}``, one row per

  • honest worker.

select_honest_responder_indices(*, worker_index: int, device: device) Tensor[source]

Randomly select the n - 2f - 1 other honest workers that respond to one worker.

Used by the "all" reach mode, where the f Byzantine models are always included, so the honest responders fill the remaining slots.

Parameters:
  • worker_index – Index of the receiving honest worker, excluded from the selection.

  • device – Device on which to build the index tensors.

Returns:

The selected honest responder indices, shape `` (n - 2f - 1,)

select_received_model_indices(*, worker_index: int, device: device) Tensor[source]

Randomly select the n - f - 1 nodes received by one honest worker.

Used by the "sampled" reach mode, where responders are drawn uniformly from every other node, honest or Byzantine.

Parameters:
  • worker_index – Index of the receiving honest worker, excluded from the selection.

  • device – Device on which to build the index tensors.

Returns:

The selected node indices, shape `` (n - f - 1,)

update_local_momentum(gradients: Tensor) Tensor[source]

Update each honest worker’s local momentum vector.

Parameters:

gradients – Stacked honest gradients, one row per worker.

Returns:

The next momentum, with one row per honest worker.

class krum.simulations.decentralised.monna_icml_2023.MonnaStepResult[source]

Bases: StepResult

MoNNA snapshot, base fields plus per-worker momentum.

momentum: Tensor

See also

For the base class, see DecentralisedSimulation.