Multi-token Markov Game with Switching Costs
Jian Li, Daogao Liu

TL;DR
This paper introduces a simple index strategy that approximates the optimal solution for a multi-token Markov game with switching costs, extending bandit problem solutions to more complex scenarios with costs.
Contribution
It presents the first constant-approximation index strategy for Markovian multi-armed bandits with switching costs when $k=1$, and a reduction to stochastic $k$-TSP for general metrics.
Findings
Achieves constant approximation for $k=1$ with constant switching costs.
Provides a reduction to stochastic $k$-TSP for general metrics.
Extends bandit problem solutions to Markov games with switching costs.
Abstract
We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least chains to reach their target states. If the player decides to play a different chain, an additional switching cost is incurred. The special case in which there is no switching cost was solved optimally by Dumitriu, Tetali, and Winkler~\cite{DTW03} by a variant of the celebrated Gittins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards \cite{Git74,Git79}. However, for Markovian multi-armed bandit with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram \cite{BS94} showed that no index strategy can be optimal. In this paper, we complement their result and show there is a simple index strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
