No-Regret Learning in Network Stochastic Zero-Sum Games
Shijie Huang, Jinlong Lei, Yiguang Hong

TL;DR
This paper introduces a distributed stochastic mirror descent method for network stochastic zero-sum games, providing regret bounds and convergence guarantees for Nash equilibria under local information and uncertainty.
Contribution
It develops a novel distributed stochastic mirror descent algorithm with theoretical regret bounds and convergence analysis for network stochastic zero-sum games.
Findings
Regret bounds of O(√T) and O(log T) for convex-concave and strongly convex-strongly concave costs.
Convergence of time-averaged iterates to Nash equilibria.
Almost sure convergence of actual iterates in strictly convex-concave settings.
Abstract
No-regret learning has been widely used to compute a Nash equilibrium in two-person zero-sum games. However, there is still a lack of regret analysis for network stochastic zero-sum games, where players competing in two subnetworks only have access to some local information, and the cost functions include uncertainty. Such a game model can be found in security games, when a group of inspectors work together to detect a group of evaders. In this paper, we propose a distributed stochastic mirror descent (D-SMD) method, and establish the regret bounds and in the expected sense for convex-concave and strongly convex-strongly concave costs, respectively. Our bounds match those of the best known first-order online optimization algorithms. We then prove the convergence of the time-averaged iterates of D-SMD to the set of Nash equilibria. Finally, we show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Age of Information Optimization
