MultiKrum¶
MultiKrum aggregation rule, multi-gradient averaging.
- Reference:
Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. “Machine learning with adversaries: Byzantine tolerant gradient descent.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).
- class aggregators.multikrum.MultiKrum[source]¶
Bases:
AggregatorMultiKrum aggregation rule, multi-gradient averaging.
Scores every worker gradient by the sum of its distances to its \(n - f - 1\) closest neighbors, picks the \(m\) gradients with the smallest scores, and returns their mean. With \(m = 1\) it reduces to
Krum.- classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, m: int, **specialized: Any) Tensor[source]¶
Aggregate the gradients.
- Parameters:
gradients – Sequence of 1-D tensors containing gradients from workers.
out – Optional pre-allocated tensor to write the result into.
n – Total number of workers.
f – Number of Byzantine workers to tolerate. Must satisfy
1 <= f <= (n - 3) // 2.m – Number of selected gradients to average. Must satisfy \(1 \le m \le n - f - 2\).
**specialized – Additional keyword arguments.
- Returns:
Aggregated gradient of shape `` (d,)
- Raises:
ValueError – If \(n\), \(f\), \(m\), or the gradients count is invalid.