Online Markov Decision Processes with Non-oblivious Strategic Adversary

Le Cong Dinh; David Henry Mguni; Long Tran-Thanh; Jun Wang; Yaodong; Yang

arXiv:2110.03604·cs.LG·January 31, 2023

Online Markov Decision Processes with Non-oblivious Strategic Adversary

Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong, Yang

PDF

TL;DR

This paper introduces algorithms for Online Markov Decision Processes against strategic adversaries, achieving regret bounds and convergence results, especially in large action spaces and game-theoretic settings.

Contribution

It extends existing algorithms to handle non-oblivious adversaries, proposes MDP-OOE leveraging Double Oracle, and provides the first last-iteration convergence result in OMDPs.

Findings

01

MDP-Expert achieves regret bounds with oblivious adversaries.

02

MDP-OOE effectively handles large action spaces using game-theoretic ideas.

03

First last-round convergence result to NE in OMDPs.

Abstract

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $O (T lo g (L) + τ^{2} T lo g (∣ A ∣))$ where $L$ is the size of adversary's pure strategy set and $∣ A ∣$ denotes the size of agent's action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of $O (T lo g (L) + τ^{2} T k lo g (k))$ where $k$ depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.