Online Learning in MDPs with Partially Adversarial Transitions and Losses

Ofir Schlisselberg; Tal Lancewicki; Yishay Mansour

arXiv:2602.09474·cs.LG·February 11, 2026

Online Learning in MDPs with Partially Adversarial Transitions and Losses

Ofir Schlisselberg, Tal Lancewicki, Yishay Mansour

PDF

Open Access

TL;DR

This paper introduces algorithms for reinforcement learning in Markov Decision Processes with mostly stochastic but occasionally adversarial transitions, providing regret bounds and characterizing the difficulty of learning under such conditions.

Contribution

It proposes the concept of conditioned occupancy measures and develops algorithms with regret bounds for MDPs with partially adversarial transitions, including cases with consecutive adversarial steps and unknown adversarial points.

Findings

01

Achieves regret of ;O(H S^{\u00A0A0} A0 ext{K} S A^{A0+1}) for arbitrary adversarial steps.

02

Improves regret dependence to A0;O(H A0 ext{K} S^{3} A^{A0+1}) when adversarial steps are consecutive.

03

Provides regret bounds for fully adversarial MDPs under different feedback models.

Abstract

We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $Λ$ steps per episode. This model captures environments that are stable except at a few vulnerable points. We introduce \emph{conditioned occupancy measures}, which remain stable across episodes even with adversarial transitions, and use them to design two algorithms. The first handles arbitrary adversarial steps and achieves regret $\tilde{O} (H S^{Λ} K S A^{Λ + 1})$ , where $K$ is the number of episodes, $S$ is the number of state, $A$ is the number of actions and $H$ is the episode's horizon. The second, assuming the adversarial steps are consecutive, improves the dependence on $S$ to $\tilde{O} (H K S^{3} A^{Λ + 1})$ . We further give a $K^{2/3}$ -regret reduction that removes the need to know which steps are the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms