Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games
Wenhao Zhan, Jason D. Lee, Zhuoran Yang

TL;DR
This paper introduces DORIS, a decentralized optimistic hyperpolicy mirror descent algorithm that achieves no-regret learning in Markov games with nonstationary opponents, ensuring convergence to equilibrium under certain conditions.
Contribution
The paper proposes DORIS, a novel algorithm for decentralized no-regret learning in Markov games with function approximation, and proves its effectiveness and equilibrium convergence.
Findings
Achieves -regret in general function approximation settings.
Mixture policy of all agents forms an approximate coarse correlated equilibrium.
Applicable to constrained and vector-valued MDPs modeled as zero-sum Markov games.
Abstract
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. Our goal is to develop a no-regret online learning algorithm that (i) takes actions based on the local information observed by the agent and (ii) is able to find the best policy in hindsight. For such a problem, the nonstationary state transitions due to the varying opponent pose a significant challenge. In light of a recent hardness result \citep{liu2022learning}, we focus on the setting where the opponent's previous policies are revealed to the agent for decision making. With such an information structure, we propose a new algorithm, \underline{D}ecentralized \underline{O}ptimistic hype\underline{R}policy m\underline{I}rror de\underline{S}cent (DORIS), which achieves -regret in the context of general function approximation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Smart Grid Energy Management
