A Communication and Computation Efficient Fully First-order Method for Decentralized Bilevel Optimization
Min Wen, Chengchang Liu, Ahmed Abdelmoniem, Yipeng Zhou and, Yuedong Xu

TL;DR
This paper introduces C^2DFB, a fully first-order decentralized bilevel optimization method that reduces computation and communication costs by using only gradient information and efficient residual transmission, suitable for federated learning tasks.
Contribution
It proposes a novel first-order decentralized bilevel optimization algorithm, C^2DFB, that avoids second-order computations and employs a lightweight communication protocol for efficiency.
Findings
Achieves convergence with first-order oracle calls of (\u03b5^{-4})
Demonstrates superior performance in hyperparameter tuning tasks
Effective across various data distributions and network typologies
Abstract
Bilevel optimization, crucial for hyperparameter tuning, meta-learning and reinforcement learning, remains less explored in the decentralized learning paradigm, such as decentralized federated learning (DFL). Typically, decentralized bilevel methods rely on both gradients and Hessian matrices to approximate hypergradients of upper-level models. However, acquiring and sharing the second-order oracle is compute and communication intensive. % and sharing this information incurs heavy communication overhead. To overcome these challenges, this paper introduces a fully first-order decentralized method for decentralized Bilevel optimization, DFB which is both compute- and communicate-efficient. In DFB, each learning node optimizes a min-min-max problem to approximate hypergradient by exclusively using gradients information. To reduce the traffic load at the inner-loop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Optimization and Variational Analysis · Advanced Optimization Algorithms Research
