No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions
Tiancheng Jin, Junyan Liu, Chlo\'e Rouyer, William Chang, Chen-Yu Wei,, Haipeng Luo

TL;DR
This paper introduces algorithms for online reinforcement learning in adversarial environments with both losses and transitions, achieving regret bounds that adapt to the level of adversarialness and environment difficulty.
Contribution
The work develops the first algorithms capable of handling adversarial transitions and losses with regret bounds that smoothly depend on adversarialness, including a black-box reduction to remove prior knowledge requirements.
Findings
Achieves $ ilde{O}( oot{T} + C^{ ext{P}})$ regret with adversarial transitions.
Provides a black-box reduction removing the need to know $C^{ ext{P}}$ beforehand.
Adapts to easier environments, achieving improved regret bounds in stochastic-like settings.
Abstract
Existing online learning algorithms for adversarial Markov Decision Processes achieve regret after rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys regret where measures how adversarial the transition functions are and can be at most . While this algorithm itself requires knowledge of , we further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics
