Near-Optimal Adversarial Reinforcement Learning with Switching Costs
Ming Shi, Yingbin Liang, Ness Shroff

TL;DR
This paper addresses the challenge of developing efficient algorithms for adversarial reinforcement learning with switching costs, establishing fundamental regret lower bounds and proposing near-optimal algorithms for both known and unknown transition functions.
Contribution
It introduces the first regret lower bounds for adversarial RL with switching costs and proposes algorithms that nearly match these bounds, advancing understanding of this complex setting.
Findings
Regret lower bound of a0( H S A )^{1/3} T^{2/3}
Proposed algorithms match the lower bound when the transition is known
Algorithms achieve near-optimal regret when the transition is unknown
Abstract
Switching costs, which capture the costs for changing policies, are regarded as a critical metric in reinforcement learning (RL), in addition to the standard metric of losses (or rewards). However, existing studies on switching costs (with a coefficient that is strictly positive and is independent of ) have mainly focused on static RL, where the loss distribution is assumed to be fixed during the learning process, and thus practical scenarios where the loss distribution could be non-stationary or even adversarial are not considered. While adversarial RL better models this type of practical scenarios, an open problem remains: how to develop a provably efficient algorithm for adversarial RL with switching costs? This paper makes the first effort towards solving this problem. First, we provide a regret lower-bound that shows that the regret of any algorithm must be larger than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
