Model-based Offline Reinforcement Learning with Lower Expectile   Q-Learning

Kwanyoung Park; Youngwoon Lee

arXiv:2407.00699·cs.LG·December 4, 2024·1 cites

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning

Kwanyoung Park, Youngwoon Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces Lower Expectile Q-learning (LEQ), a model-based offline RL method that improves value estimation accuracy and outperforms previous approaches on long-horizon and diverse tasks.

Contribution

LEQ employs lower expectile regression of $bb$-returns for low-bias value estimation, advancing model-based offline RL with robust performance across various environments.

Findings

01

LEQ outperforms previous model-based offline RL methods on long-horizon tasks.

02

LEQ matches or surpasses model-free and sequence modeling approaches in diverse environments.

03

Ablation studies confirm the importance of lower expectile regression and critic training on offline data.

Abstract

Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, these approaches often struggle with inaccurate value estimation from model rollouts. In this paper, we introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ), which provides a low-bias model-based value estimation via lower expectile regression of $λ$ -returns. Our empirical results show that LEQ significantly outperforms previous model-based offline RL methods on long-horizon tasks, such as the D4RL AntMaze tasks, matching or surpassing the performance of model-free approaches and sequence modeling approaches. Furthermore, LEQ matches the performance of state-of-the-art model-based and model-free methods in dense-reward environments across both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsQ-Learning