Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

Arman Bolatov; Samuel Horv\'ath; Martin Tak\'a\v{c}; Eduard Gorbunov

arXiv:2603.12512·cs.LG·March 16, 2026

Byzantine-Robust Optimization under $(L_0, L_1)$-Smoothness

Arman Bolatov, Samuel Horv\'ath, Martin Tak\'a\v{c}, Eduard Gorbunov

PDF

Open Access

TL;DR

This paper introduces Byz-NSGDM, a robust distributed optimization algorithm designed to withstand Byzantine attacks under a generalized smoothness condition, with proven convergence and validated effectiveness through experiments.

Contribution

The paper presents Byz-NSGDM, a novel Byzantine-robust stochastic gradient method that handles $(L_0,L_1)$-smoothness, combining momentum normalization and NNM aggregation for improved robustness.

Findings

01

Achieves $O(K^{-1/4})$ convergence rate with Byzantine robustness.

02

Effective against various Byzantine attack strategies in experiments.

03

Robust across different momentum and learning rate settings.

Abstract

We consider distributed optimization under Byzantine attacks in the presence of $(L_{0}, L_{1})$ -smoothness, a generalization of standard $L$ -smoothness that captures functions with state-dependent gradient Lipschitz constants. We propose Byz-NSGDM, a normalized stochastic gradient descent method with momentum that achieves robustness against Byzantine workers while maintaining convergence guarantees. Our algorithm combines momentum normalization with Byzantine-robust aggregation enhanced by Nearest Neighbor Mixing (NNM) to handle both the challenges posed by $(L_{0}, L_{1})$ -smoothness and Byzantine adversaries. We prove that Byz-NSGDM achieves a convergence rate of $O (K^{- 1/4})$ up to a Byzantine bias floor proportional to the robustness coefficient and gradient heterogeneity. Experimental validation on heterogeneous MNIST classification, synthetic $(L_{0}, L_{1})$ -smooth optimization, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning