Online Learning in MDPs with Partially Adversarial Transitions and Losses
Ofir Schlisselberg, Tal Lancewicki, Yishay Mansour

TL;DR
This paper introduces algorithms for reinforcement learning in Markov Decision Processes with mostly stochastic but occasionally adversarial transitions, providing regret bounds and characterizing the difficulty of learning under such conditions.
Contribution
It proposes the concept of conditioned occupancy measures and develops algorithms with regret bounds for MDPs with partially adversarial transitions, including cases with consecutive adversarial steps and unknown adversarial points.
Findings
Achieves regret of ;O(H S^{\u00A0A0} A0 ext{K} S A^{A0+1}) for arbitrary adversarial steps.
Improves regret dependence to A0;O(H A0 ext{K} S^{3} A^{A0+1}) when adversarial steps are consecutive.
Provides regret bounds for fully adversarial MDPs under different feedback models.
Abstract
We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of steps per episode. This model captures environments that are stable except at a few vulnerable points. We introduce \emph{conditioned occupancy measures}, which remain stable across episodes even with adversarial transitions, and use them to design two algorithms. The first handles arbitrary adversarial steps and achieves regret , where is the number of episodes, is the number of state, is the number of actions and is the episode's horizon. The second, assuming the adversarial steps are consecutive, improves the dependence on to . We further give a -regret reduction that removes the need to know which steps are the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
