Online Markov Decision Processes with Non-oblivious Strategic Adversary
Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong, Yang

TL;DR
This paper introduces algorithms for Online Markov Decision Processes against strategic adversaries, achieving regret bounds and convergence results, especially in large action spaces and game-theoretic settings.
Contribution
It extends existing algorithms to handle non-oblivious adversaries, proposes MDP-OOE leveraging Double Oracle, and provides the first last-iteration convergence result in OMDPs.
Findings
MDP-Expert achieves regret bounds with oblivious adversaries.
MDP-OOE effectively handles large action spaces using game-theoretic ideas.
First last-round convergence result to NE in OMDPs.
Abstract
We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of where is the size of adversary's pure strategy set and denotes the size of agent's action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of where depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
