Bulyan

Bulyan aggregation rule, two-stage Krum + trimmed mean.

Reference:

El Mahdi El Mhamdi, Rachid Guerraoui, and Sébastien Rouault. “The Hidden Vulnerability of Distributed Learning in Byzantium.” In Proceedings of the 35th International Conference on Machine Learning (ICML 2018).

class aggregators.bulyan.Bulyan[source]

Bases: Aggregator

Bulyan aggregation rule, two-stage Krum + trimmed mean.

Bulyan first iteratively applies a Krum-style scoring rule to select a set S of θ = n − 2f gradients (one per iteration — the gradient with the lowest Krum score among the still-unselected gradients, then removed from the candidate pool). It then aggregates that set coordinate-wise by taking the median over S and averaging the β = θ − 2f = n − 4f closest values to the median per coordinate.

The paper studies Bulyan(A) with A = Krum (Bulyan(Krum)) in all figures; this implementation follows that choice. A different base rule A could be plugged in by overriding _select_one().

classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, n: int, f: int, m: int | None = None, **specialized: Any) Tensor[source]

Aggregate the gradients.

Parameters:
  • gradients – Sequence of 1-D tensors containing gradients from workers.

  • out – Optional pre-allocated tensor to write the result into.

  • n – Total number of workers. Must satisfy \(n \ge 4f + 3\).

  • f – Number of Byzantine workers to tolerate. Must satisfy 1 <= f <= (n - 3) // 4.

  • m – Number of gradients selected by MultiKrum at each iteration. Defaults to \(n - f - 2\).

  • **specialized – Additional keyword arguments.

Returns:

Aggregated gradient of shape `` (d,)

Raises:

ValueError – If \(n\), \(f\), \(m\), or the gradients count is invalid.

See also

Built on top of Krum and MultiKrum.