Decentralized Nonsmooth Nonconvex Optimization with Client Sampling

Xinyan Chen; Weiguo Gao; Luo Luo

arXiv:2601.19381·math.OC·January 28, 2026

Decentralized Nonsmooth Nonconvex Optimization with Client Sampling

Xinyan Chen, Weiguo Gao, Luo Luo

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new decentralized stochastic method for nonsmooth nonconvex optimization with client sampling, achieving optimal sample complexity and improved communication efficiency, and extends to zeroth-order optimization with empirical validation.

Contribution

It proposes a novel stochastic first-order method with client sampling for decentralized nonsmooth nonconvex problems, achieving optimal sample complexity and sharper communication bounds.

Findings

01

Achieves optimal sample complexity of ${ m O}(rac{1}{\delta \epsilon^3})$

02

Reduces communication rounds compared to existing methods

03

Demonstrates empirical advantages through numerical experiments

Abstract

This paper considers decentralized nonsmooth nonconvex optimization problem with Lipschitz continuous local functions. We propose an efficient stochastic first-order method with client sampling, achieving the $(δ, ϵ)$ -Goldstein stationary point with the overall sample complexity of $O (δ^{- 1} ϵ^{- 3})$ , the computation rounds of $O (δ^{- 1} ϵ^{- 3})$ , and the communication rounds of $\tilde{O} (γ^{- 1/2} δ^{- 1} ϵ^{- 3})$ , where $γ$ is the spectral gap of the mixing matrix for the network. Our results achieve the optimal sample complexity and the sharper communication complexity than existing methods. We also extend our ideas to zeroth-order optimization. Moreover, the numerical experiments show the empirical advantage of our methods.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. The proposed results have theoretical guarantees 2. The proposed algorithm has significantly better performance compared to existing algorithms (see Table 1).

Weaknesses

**Minor comments:** P.18 line 966. $g_i^{k,t}$ is not defined in Alg 1 and Theorem 1, only $g_{i^t}^{k,t}$ is defined P.23 line 1231. Can you explain why you assume $\delta < 1$? I think we can't select $\delta$. **Typos:** P.19 line 978 $y_i^{k, t-1/2} \to y_i^{k, t-1}$ P.19 line 984 $2n \to n$ P.19 line 1024 $x^{k,t-1} \to \overline{x}^{k, t-1} $ P.24 line 1262. $\sqrt{T} \to 1$ P.24 line 1295. $\nu + L \to \nu + L\delta$, also for th 3 P.25 line 1319. $4 \to 8$ P.25 line 1319. $k

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper solves the decentralized nonsmooth nonconvex optimization problem to a sharper upper bound than existing methods. 2. It incorporates the steps of client sampling and Chebyshev acceleration into the framework of online-to-nonconvex conversion, which does not require all clients accessing their local oracle in per computation rounds, and thus reduces the computation cost. 3. The experimental results support the theoretical findings.

Weaknesses

1. The numerical experiments are limited to only two models. More experiments on different models and more recent datasets would strengthen the empirical validation of the proposed method.

Reviewer 03Rating 6Confidence 3

Strengths

1. Introduces the first decentralized nonsmooth nonconvex method supporting client sampling, enhancing the scalability of the proposed algorithm. 2. Achieves optimal sample complexity and sharper communication bounds than other state-of-the-art methods. 3. Extends the proposed algorithm to the settings of zeroth-order optimization with dimension-dependent complexity $\mathcal{O}(d\delta^{-1}\epsilon^{-3})$. 4. Theoretical results are well-grounded and consistent with known lower bounds. 5.

Weaknesses

Assumptions (Lipschitz continuity, fixed mixing matrix) may be restrictive in heterogeneous or time-varying networks.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques