TL;DR
This paper develops online control algorithms for multi-agent systems facing adversarial disturbances, providing regret bounds and equilibrium guarantees that enhance robustness and performance in dynamic environments.
Contribution
It introduces a novel analysis of online gradient-based control for multi-agent systems under adversarial disturbances, bridging online learning and game theory.
Findings
Sublinear regret bounds for individual agents.
Near-optimal scaling with the number of agents.
Guarantees for equilibrium tracking in potential games.
Abstract
Online multi-agent control problems, where many agents pursue competing and time-varying objectives, are widespread in domains such as autonomous robotics, economics, and energy systems. In these settings, robustness to adversarial disturbances is critical. In this paper, we study online control in multi-agent linear dynamical systems subject to such disturbances. In contrast to most prior work in multi-agent control, which typically assumes noiseless or stochastically perturbed dynamics, we consider an online setting where disturbances can be adversarial, and where each agent seeks to minimize its own sequence of convex losses. Under two feedback models, we analyze online gradient-based controllers with local policy updates. We prove per-agent regret bounds that are sublinear and near-optimal in the time horizon and that highlight different scalings with the number of agents. When…
Peer Reviews
Decision·Submitted to ICLR 2026
The paper puts forward and interesting setup, and it is an interesting question as to what equilibria arise due to self-centered learning behaviors in this setting. The closest work (Ghai et al) I am aware of only considers the cooperative setting: where the feedback is limited, but the cost function is shared. This paper on the other hand models disparate costs per agent.
In my reading, a major weakness is that many results (Theorem 3.2. 3.3, 3.4) in the paper follow blackbox (entirely or almost) from known results, and hence the contribution of the present work in these contexts is restricted to framing. Note that the regret is defined as treating other agents' actions fixed. Thus, Theorem 3.4 is in fact a corollary of earlier work on online non-stochastic control, purely by including the other players' actions in the "disturbances". The size of disturbances in
1. The paper tackles a challenging and relevant problem at the intersection of online non-stochastic control and online learning in games. 2. The paper provides a strong theoretical analysis. The analysis under two different information structures (Settings 1 and 2) provides a clear understanding of the value of information and the price of decentralization. The $\Omega(\sqrt{T})$ lower bound confirms the optimality of the bounds with respect to $T$. 3. The equilibrium tracking result (Theorem 4
1. The algorithm assumes that agents have perfect knowledge of the system dynamics ($A$ and their own $B_i$). This is a common but significant limitation. While unknown dynamics is mentioned as future work, a brief discussion in the main text about the specific challenges this would introduce (e.g., the need for system identification competing with the adversarial disturbances) would be beneficial. 2. The algorithm requires agents to possess a stabilizing linear controller $K_i$ a priori. More
This work extends [Agarwal 2019] in an interesting direction. Analyzing regret / convergence / stability for multi-agent systems can always be challenging. To address the challenge, the authors identify nontrivial multi-agent setups in which learning can happen. Further, the paper is techinically sound and easy to follow. The authors may benifit from a better comparison of their analysis to the original analysis in Agarwal, and highlight the technical difficulty.
My concern is about how nontrivial the analyses are. For the three results: 1. When each individual only observe their own actions: it seems one doesn't need to care about other actions because the perturbation is already adversarial. Further, the disturbance caused by other agents' action is bounded due to the system being stable. Therefore, it seems one can just apply [Agarwal] result directly with N different copies. 2. When the individual agent can observe actions from other agents, they c
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
