Decentralized Nonsmooth Nonconvex Optimization with Client Sampling
Xinyan Chen, Weiguo Gao, Luo Luo

TL;DR
This paper introduces a new decentralized stochastic method for nonsmooth nonconvex optimization with client sampling, achieving optimal sample complexity and improved communication efficiency, and extends to zeroth-order optimization with empirical validation.
Contribution
It proposes a novel stochastic first-order method with client sampling for decentralized nonsmooth nonconvex problems, achieving optimal sample complexity and sharper communication bounds.
Findings
Achieves optimal sample complexity of ${ m O}(rac{1}{\delta \epsilon^3})$
Reduces communication rounds compared to existing methods
Demonstrates empirical advantages through numerical experiments
Abstract
This paper considers decentralized nonsmooth nonconvex optimization problem with Lipschitz continuous local functions. We propose an efficient stochastic first-order method with client sampling, achieving the -Goldstein stationary point with the overall sample complexity of , the computation rounds of , and the communication rounds of , where is the spectral gap of the mixing matrix for the network. Our results achieve the optimal sample complexity and the sharper communication complexity than existing methods. We also extend our ideas to zeroth-order optimization. Moreover, the numerical experiments show the empirical advantage of our methods.
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed results have theoretical guarantees 2. The proposed algorithm has significantly better performance compared to existing algorithms (see Table 1).
**Minor comments:** P.18 line 966. $g_i^{k,t}$ is not defined in Alg 1 and Theorem 1, only $g_{i^t}^{k,t}$ is defined P.23 line 1231. Can you explain why you assume $\delta < 1$? I think we can't select $\delta$. **Typos:** P.19 line 978 $y_i^{k, t-1/2} \to y_i^{k, t-1}$ P.19 line 984 $2n \to n$ P.19 line 1024 $x^{k,t-1} \to \overline{x}^{k, t-1} $ P.24 line 1262. $\sqrt{T} \to 1$ P.24 line 1295. $\nu + L \to \nu + L\delta$, also for th 3 P.25 line 1319. $4 \to 8$ P.25 line 1319. $k
1. The paper solves the decentralized nonsmooth nonconvex optimization problem to a sharper upper bound than existing methods. 2. It incorporates the steps of client sampling and Chebyshev acceleration into the framework of online-to-nonconvex conversion, which does not require all clients accessing their local oracle in per computation rounds, and thus reduces the computation cost. 3. The experimental results support the theoretical findings.
1. The numerical experiments are limited to only two models. More experiments on different models and more recent datasets would strengthen the empirical validation of the proposed method.
1. Introduces the first decentralized nonsmooth nonconvex method supporting client sampling, enhancing the scalability of the proposed algorithm. 2. Achieves optimal sample complexity and sharper communication bounds than other state-of-the-art methods. 3. Extends the proposed algorithm to the settings of zeroth-order optimization with dimension-dependent complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$. 4. Theoretical results are well-grounded and consistent with known lower bounds. 5.
Assumptions (Lipschitz continuity, fixed mixing matrix) may be restrictive in heterogeneous or time-varying networks.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques
