No-Regret Online Reinforcement Learning with Adversarial Losses and   Transitions

Tiancheng Jin; Junyan Liu; Chlo\'e Rouyer; William Chang; Chen-Yu Wei,; Haipeng Luo

arXiv:2305.17380·cs.LG·October 27, 2023·1 cites

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Tiancheng Jin, Junyan Liu, Chlo\'e Rouyer, William Chang, Chen-Yu Wei,, Haipeng Luo

PDF

Open Access 1 Video

TL;DR

This paper introduces algorithms for online reinforcement learning in adversarial environments with both losses and transitions, achieving regret bounds that adapt to the level of adversarialness and environment difficulty.

Contribution

The work develops the first algorithms capable of handling adversarial transitions and losses with regret bounds that smoothly depend on adversarialness, including a black-box reduction to remove prior knowledge requirements.

Findings

01

Achieves $ ilde{O}( oot{T} + C^{ ext{P}})$ regret with adversarial transitions.

02

Provides a black-box reduction removing the need to know $C^{ ext{P}}$ beforehand.

03

Adapts to easier environments, achieving improved regret bounds in stochastic-like settings.

Abstract

Existing online learning algorithms for adversarial Markov Decision Processes achieve $O (T)$ regret after $T$ rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed. This is because it has been shown that adversarial transition functions make no-regret learning impossible. Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary. More concretely, we first propose an algorithm that enjoys $O (T + C^{P})$ regret where $C^{P}$ measures how adversarial the transition functions are and can be at most $O (T)$ . While this algorithm itself requires knowledge of $C^{P}$ , we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research · Reinforcement Learning in Robotics