Krum

Krum aggregation rule, single-gradient selection.

Reference:

Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. “Machine learning with adversaries: Byzantine tolerant gradient descent.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).

class aggregators.krum.Krum[source]

Bases: MultiKrum

Krum aggregation rule, single-gradient selection.

For each worker gradient, Krum scores it by the sum of its distances to its \(n - f - 2\) closest neighbors, and returns the gradient with the smallest score — the one most consistent with the other honest workers. This is MultiKrum with \(m = 1\).

classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, **specialized: Any) Tensor[source]

Aggregate the gradients.

Parameters:
  • gradients – Sequence of 1-D tensors containing gradients from workers.

  • out – Optional pre-allocated tensor to write the result into.

  • n – Total number of workers.

  • f – Number of Byzantine workers to tolerate.

  • **specialized – Additional keyword arguments.

Returns:

Aggregated gradient of shape `` (d,)

Raises:

ValueError – If \(n < 1\), \(f < 0\), \(f > n\), \(n < 2f + 3\), or len(gradients) != n.

See also

For multi-gradient averaging, see MultiKrum. For stronger two-stage resilience, see Bulyan.