Trimmed Mean¶
Trimmed mean aggregation rule, coordinate-wise.
- Reference:
Dong Yin, Yudong Chen, Kannan Ramchandran, and Peter Bartlett. “Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates.” In Proceedings of the 35th International Conference on Machine Learning (ICML 2018).
- class aggregators.trimmed_mean.TrimmedMean[source]¶
Bases:
AggregatorTrimmed mean aggregation rule, coordinate-wise.
For every coordinate, the \(f\) smallest and \(f\) largest values are dropped, then the remaining values are averaged. This requires at least \(2f + 1\) workers and provides basic Byzantine resilience: adversarial workers can only shift at most \(f\) samples per coordinate.
- classmethod aggregate(gradients: Sequence[Tensor] | Tensor, /, out: Tensor | None = None, *, f: int, **specialized: Any) Tensor[source]¶
Aggregate the gradients.
- Parameters:
gradients – Sequence of 1-D tensors containing gradients from workers.
out – Optional pre-allocated tensor to write the result into.
f – Number of Byzantine workers to tolerate. Must satisfy \(0 \le f\) and
len(gradients) > 2f.**specialized – Additional keyword arguments.
- Returns:
Coordinate-wise trimmed mean of the gradients, of shape `` (d,)
- Raises:
ValueError – If \(f\) is negative or if there are not enough gradients to trim (
len(gradients) <= 2f).