FAB: A First-Order AB-based Gradient Algorithm for Distributed Bilevel Optimization over Time-Varying Directed Graphs

Yaoshuai Ma; Xiao Wang; Wei Yao; Jin Zhang

arXiv:2605.06328·math.OC·May 8, 2026

FAB: A First-Order AB-based Gradient Algorithm for Distributed Bilevel Optimization over Time-Varying Directed Graphs

Yaoshuai Ma, Xiao Wang, Wei Yao, Jin Zhang

PDF

TL;DR

This paper introduces FAB, a first-order gradient algorithm for distributed bilevel optimization over dynamic directed graphs, addressing hyperparameter tuning and consensus challenges with proven convergence and empirical validation.

Contribution

It proposes a novel fully first-order distributed bilevel optimization algorithm combining Push-Pull communication with a penalty method, and provides convergence analysis over time-varying directed graphs.

Findings

01

Convergence rate established for Push-Pull algorithm in nonconvex settings.

02

Algorithm effectively handles hyperparameter tuning in distributed tasks.

03

Empirical results validate efficiency across hyperparameter tuning, data cleaning, and reinforcement learning.

Abstract

Distributed optimization over time-varying directed graphs has shown promising performance in addressing challenges posed by complex communication constraints in real-world scenarios. In many practical settings, however, the direct application of distributed optimization algorithms encounters additional difficulties, most notably hyperparameter tuning, which our empirical observations suggest can be effectively mitigated by integrating bilevel optimization. Motivated by these findings, we study distributed bilevel optimization over time-varying directed networks, a problem that remains largely unexplored due to the compounded challenges arising from consensus bias in dynamic unbalanced communication and the nested optimization structure. In this work, we propose a fully first-order distributed gradient-based algorithm that integrates the Push-Pull (also known as AB) communication…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.