Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning
Kwanyoung Park, Youngwoon Lee

TL;DR
This paper introduces Lower Expectile Q-learning (LEQ), a model-based offline RL method that improves value estimation accuracy and outperforms previous approaches on long-horizon and diverse tasks.
Contribution
LEQ employs lower expectile regression of $bb$-returns for low-bias value estimation, advancing model-based offline RL with robust performance across various environments.
Findings
LEQ outperforms previous model-based offline RL methods on long-horizon tasks.
LEQ matches or surpasses model-free and sequence modeling approaches in diverse environments.
Ablation studies confirm the importance of lower expectile regression and critic training on offline data.
Abstract
Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, these approaches often struggle with inaccurate value estimation from model rollouts. In this paper, we introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ), which provides a low-bias model-based value estimation via lower expectile regression of -returns. Our empirical results show that LEQ significantly outperforms previous model-based offline RL methods on long-horizon tasks, such as the D4RL AntMaze tasks, matching or surpassing the performance of model-free approaches and sequence modeling approaches. Furthermore, LEQ matches the performance of state-of-the-art model-based and model-free methods in dense-reward environments across both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsQ-Learning
