Decentralized Nash Equilibria Learning for Online Game with Bandit Feedback
Min Meng, Xiuxian Li, Jie Chen

TL;DR
This paper introduces a distributed online algorithm for learning generalized Nash equilibria in time-varying games with bandit feedback, achieving sublinear regret and constraint violation, even with delays.
Contribution
It proposes a novel mirror descent-based distributed algorithm for online Nash equilibrium seeking with bandit feedback and delays, extending prior methods to dynamic, constrained settings.
Findings
Achieves sublinear expected regret and constraint violation.
Extends to delayed feedback scenarios with similar guarantees.
Validated through simulations demonstrating effectiveness.
Abstract
This paper studies distributed online bandit learning of generalized Nash equilibria for online game, where cost functions of all players and coupled constraints are time-varying. The values rather than full information of cost and local constraint functions are revealed to local players gradually. The goal of each player is to selfishly minimize its own cost function with no future information subject to a strategy set constraint and time-varying coupled inequality constraints. To this end, a distributed online algorithm based on mirror descent and one-point bandit feedback is designed for seeking generalized Nash equilibria of the online game. It is shown that the devised online algorithm achieves sublinear expected regrets and accumulated constraint violation if the path variation of the generalized Nash equilibrium sequence is sublinear. Furthermore, the proposed algorithm is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics
