Decentralized Nash Equilibria Learning for Online Game with Bandit   Feedback

Min Meng; Xiuxian Li; Jie Chen

arXiv:2204.09467·math.OC·April 21, 2022·IEEE Trans. Autom. Control.

Decentralized Nash Equilibria Learning for Online Game with Bandit Feedback

Min Meng, Xiuxian Li, Jie Chen

PDF

Open Access

TL;DR

This paper introduces a distributed online algorithm for learning generalized Nash equilibria in time-varying games with bandit feedback, achieving sublinear regret and constraint violation, even with delays.

Contribution

It proposes a novel mirror descent-based distributed algorithm for online Nash equilibrium seeking with bandit feedback and delays, extending prior methods to dynamic, constrained settings.

Findings

01

Achieves sublinear expected regret and constraint violation.

02

Extends to delayed feedback scenarios with similar guarantees.

03

Validated through simulations demonstrating effectiveness.

Abstract

This paper studies distributed online bandit learning of generalized Nash equilibria for online game, where cost functions of all players and coupled constraints are time-varying. The values rather than full information of cost and local constraint functions are revealed to local players gradually. The goal of each player is to selfishly minimize its own cost function with no future information subject to a strategy set constraint and time-varying coupled inequality constraints. To this end, a distributed online algorithm based on mirror descent and one-point bandit feedback is designed for seeking generalized Nash equilibria of the online game. It is shown that the devised online algorithm achieves sublinear expected regrets and accumulated constraint violation if the path variation of the generalized Nash equilibrium sequence is sublinear. Furthermore, the proposed algorithm is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adaptive Dynamic Programming Control · Reinforcement Learning in Robotics