MultiKrum

MultiKrum aggregation rule, multi-gradient averaging.

Reference:

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. “Machine learning with adversaries: Byzantine tolerant gradient descent.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).

class aggregators.multikrum.MultiKrum[source]

Bases: Aggregator

MultiKrum aggregation rule, multi-gradient averaging.

Scores every worker gradient by the sum of its distances to its \(n - f - 1\) closest neighbors, picks the \(m\) gradients with the smallest scores, and returns their mean. With \(m = 1\) it reduces to Krum.

classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, m: int, **specialized: Any) Tensor[source]

Aggregate the gradients.

Parameters:
  • gradients – Sequence of 1-D tensors containing gradients from workers.

  • out – Optional pre-allocated tensor to write the result into.

  • n – Total number of workers.

  • f – Number of Byzantine workers to tolerate. Must satisfy 1 <= f <= (n - 3) // 2.

  • m – Number of selected gradients to average. Must satisfy \(1 \le m \le n - f - 2\).

  • **specialized – Additional keyword arguments.

Returns:

Aggregated gradient of shape `` (d,)

Raises:

ValueError – If \(n\), \(f\), \(m\), or the gradients count is invalid.

See also

For single-gradient selection, see Krum. For stronger two-stage resilience, see Bulyan.