Multi-token Markov Game with Switching Costs

Jian Li; Daogao Liu

arXiv:2107.05822·cs.DS·November 2, 2021

Multi-token Markov Game with Switching Costs

Jian Li, Daogao Liu

PDF

Open Access

TL;DR

This paper introduces a simple index strategy that approximates the optimal solution for a multi-token Markov game with switching costs, extending bandit problem solutions to more complex scenarios with costs.

Contribution

It presents the first constant-approximation index strategy for Markovian multi-armed bandits with switching costs when $k=1$, and a reduction to stochastic $k$-TSP for general metrics.

Findings

01

Achieves constant approximation for $k=1$ with constant switching costs.

02

Provides a reduction to stochastic $k$-TSP for general metrics.

03

Extends bandit problem solutions to Markov games with switching costs.

Abstract

We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least $k$ chains to reach their target states. If the player decides to play a different chain, an additional switching cost is incurred. The special case in which there is no switching cost was solved optimally by Dumitriu, Tetali, and Winkler~\cite{DTW03} by a variant of the celebrated Gittins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards \cite{Git74,Git79}. However, for Markovian multi-armed bandit with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram \cite{BS94} showed that no index strategy can be optimal. In this paper, we complement their result and show there is a simple index strategy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics