Bulyan¶
Bulyan aggregation rule, two-stage Krum + trimmed mean.
- Reference:
El Mahdi El Mhamdi, Rachid Guerraoui, and Sébastien Rouault. “The Hidden Vulnerability of Distributed Learning in Byzantium.” In Proceedings of the 35th International Conference on Machine Learning (ICML 2018).
- class aggregators.bulyan.Bulyan[source]¶
Bases:
AggregatorBulyan aggregation rule, two-stage Krum + trimmed mean.
Bulyan first iteratively applies a Krum-style scoring rule to select a set S of θ = n − 2f gradients (one per iteration — the gradient with the lowest Krum score among the still-unselected gradients, then removed from the candidate pool). It then aggregates that set coordinate-wise by taking the median over S and averaging the β = θ − 2f = n − 4f closest values to the median per coordinate.
The paper studies
Bulyan(A)withA = Krum(Bulyan(Krum)) in all figures; this implementation follows that choice. A different base rule A could be plugged in by overriding_select_one().- classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, m: int | None = None, **specialized: Any) Tensor[source]¶
Aggregate the gradients.
- Parameters:
gradients – Sequence of 1-D tensors containing gradients from workers.
out – Optional pre-allocated tensor to write the result into.
n – Total number of workers. Must satisfy \(n \ge 4f + 3\).
f – Number of Byzantine workers to tolerate. Must satisfy
1 <= f <= (n - 3) // 4.m – Number of gradients selected by MultiKrum at each iteration. Defaults to \(n - f - 2\).
**specialized – Additional keyword arguments.
- Returns:
Aggregated gradient of shape `` (d,)
- Raises:
ValueError – If \(n\), \(f\), \(m\), or the gradients count is invalid.