MultiKrum¶

MultiKrum aggregation rule, multi-gradient averaging.

Reference:: Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. “Machine learning with adversaries: Byzantine tolerant gradient descent.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).

class aggregators.multikrum.MultiKrum[source]¶

Bases: Aggregator

MultiKrum aggregation rule, multi-gradient averaging.

Scores every worker gradient by the sum of its distances to its \(n - f - 1\) closest neighbors, picks the \(m\) gradients with the smallest scores, and returns their mean. With \(m = 1\) it reduces to Krum.

classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, m: int, **specialized: Any) → Tensor[source]¶

Aggregate the gradients.

Parameters:

gradients – Sequence of 1-D tensors containing gradients from workers.
out – Optional pre-allocated tensor to write the result into.
n – Total number of workers.
f – Number of Byzantine workers to tolerate. Must satisfy 1 <= f <= (n - 3) // 2.
m – Number of selected gradients to average. Must satisfy \(1 \le m \le n - f - 2\).
**specialized – Additional keyword arguments.

Returns:

Aggregated gradient of shape `` (d,)

Raises:

ValueError – If \(n\), \(f\), \(m\), or the gradients count is invalid.