Bootstrapping Expectiles in Reinforcement Learning

Pierre Clavier; Emmanuel Rachelson; Erwan Le Pennec; Matthieu Geist

arXiv:2406.04081·cs.LG·June 7, 2024

Bootstrapping Expectiles in Reinforcement Learning

Pierre Clavier, Emmanuel Rachelson, Erwan Le Pennec, Matthieu Geist

PDF

Open Access

TL;DR

This paper introduces ExpectRL, a novel reinforcement learning method that replaces the expectation with an expectile to incorporate pessimism, improving robustness and reducing overestimation in various RL settings.

Contribution

The paper proposes ExpectRL, a simple yet effective modification of critic loss using expectiles, with automatic expectile selection, enhancing robustness and overestimation mitigation in RL.

Findings

01

ExpectRL outperforms twin-critic methods in overestimation tasks.

02

ExpectRL demonstrates increased robustness on environment-changing benchmarks.

03

A variant with domain randomization is competitive with state-of-the-art robust RL agents.

Abstract

Many classic Reinforcement Learning (RL) algorithms rely on a Bellman operator, which involves an expectation over the next states, leading to the concept of bootstrapping. To introduce a form of pessimism, we propose to replace this expectation with an expectile. In practice, this can be very simply done by replacing the $L_{2}$ loss with a more general expectile loss for the critic. Introducing pessimism in RL is desirable for various reasons, such as tackling the overestimation problem (for which classic solutions are double Q-learning or the twin-critic approach of TD3) or robust RL (where transitions are adversarial). We study empirically these two cases. For the overestimation problem, we show that the proposed approach, ExpectRL, provides better results than a classic twin-critic. On robust RL benchmarks, involving changes of the environment, we show that our approach is more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications

MethodsQ-Learning · Double Q-learning