SUN-DSBO: A Structured Unified Framework for Nonconvex Decentralized Stochastic Bilevel Optimization
Yaoshuai Ma, Xiao Wang, Wei Yao, and Jin Zhang

TL;DR
SUN-DSBO is a new unified framework for nonconvex decentralized stochastic bilevel optimization, enabling flexible techniques like gradient tracking and achieving linear speedup without restrictive assumptions.
Contribution
It introduces SUN-DSBO, a comprehensive framework for nonconvex DSBO, incorporating various optimization techniques and demonstrating improved scalability and robustness.
Findings
SUN-DSBO-GT achieves linear speedup with respect to the number of agents.
The framework works without assuming gradient boundedness or specific heterogeneity conditions.
Numerical experiments confirm the effectiveness of the proposed method.
Abstract
Decentralized stochastic bilevel optimization (DSBO) is a powerful tool for various machine learning tasks, including decentralized meta-learning and hyperparameter tuning. Existing DSBO methods primarily address problems with strongly convex lower-level objective functions. However, nonconvex objective functions are increasingly prevalent in modern deep learning. In this work, we introduce SUN-DSBO, a Structured Unified framework for Nonconvex DSBO, in which both the upper- and lower-level objective functions may be nonconvex. Notably, SUN-DSBO offers the flexibility to incorporate decentralized stochastic gradient descent or various techniques for mitigating data heterogeneity, such as gradient tracking (GT). We demonstrate that SUN-DSBO-GT, an adaptation of the GT technique within our framework, achieves a linear speedup with respect to the number of agents. This is accomplished…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed SUN-DSBO framework broadens the scope of DSBO by addressing the nonconvexity of the lower-level objective, a setting largely unexplored in decentralized bilevel optimization. 2. SUN-DSBO is flexible enough to accommodate various decentralized schemes (e.g., GT, EXTRA), making it extensible and modular. 3. The paper provides finite-time convergence analysis for both SUN-DSBO-SE and SUN-DSBO-GT under realistic and relaxed assumptions. Notably, SUN-DSBO-GT achieves linear speedup wi
1. Although a direct comparison of the theoretical results with prior DSBO works may be somewhat unfair, given that those methods typically rely on additional assumptions such as strong convexity of the lower-level problem, it is still valuable to include such a comparison to highlight the differences and advancements introduced by this work. 2. While the paper claims that SUN-DSBO-GT introduces more overhead than SUN-DSBO-SE, no quantitative comparison of communication cost, memory usage, or wa
1) This is the first DSBO method that rigorously handles *nonconvex–nonconvex* bilevel objectives in a decentralized, stochastic setting. Previous approaches (SPARKLE, D-SOBA, SLDBO) required lower-level strong convexity or Hessian-based hypergradients; SUN-DSBO works under far those milder assumptions. 2) The Moreau-envelope penalty and auxiliary variable (\theta) yield tractable gradient updates $((D_x,D_y,D_\theta))$ computable via mini-batch gradients, which remove all second-order depende
1) Only the GT strategy is analyzed theoretically; variants using **EXTRA** or **Exact-Diffusion** remain unexplored, and a unified convergence proof would strengthen the framework. 2) The paper lacks a systematic study of the penalty parameters (\mu) and (\gamma), which affect stability and constraint tightness. 3) Although GT achieves linear speedup, the communication-vs-computation trade-off (messages per ε-stationary point) is not quantified.
1. This paper proposes a novel algorithm, which solves the DSBO problem with non-convex upper-level and lower-level objective functions. They provide solid convergence analysis of the convergence to stationarity and concensus among different agents. The authors also conduct some experiments to support their findings. 2. The algorithms achieve linear speedup effect, a non-trivial result in decentralized optimization literature. 3. Source code is provided.
1. One major limitation is, the idea and methodology of this paper seem to follow [1]. The proposed algorithms SUN-DSBO-SE/GT seem to be a direct extension of the centralized case considered in [1]. The techinques used in this paper, such as gradient tracking, consensus analysis, and convergence analysis are standard in existing distributed optimization literature. 2. The experiments seem to be limited on relatively simple examples/problems. Ref: [1] Moreau envelope for nonconvex bi-level opt
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Optimization and Variational Analysis
