Exclusively Penalized Q-learning for Offline Reinforcement Learning

Junghyuk Yeom; Yonghyeon Jo; Jungmo Kim; Sanghyeon Lee; Seungyul Han

arXiv:2405.14082·cs.LG·October 25, 2024

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

PDF

Open Access 1 Video

TL;DR

This paper introduces Exclusively Penalized Q-learning (EPQ), a novel offline RL method that selectively penalizes states to reduce estimation bias and enhance performance in control tasks.

Contribution

EPQ is the first approach to selectively penalize states, addressing underestimation bias in offline RL and outperforming existing methods.

Findings

01

EPQ significantly reduces underestimation bias.

02

EPQ improves performance in offline control tasks.

03

EPQ outperforms other offline RL methods.

Abstract

Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exclusively Penalized Q-learning for Offline Reinforcement Learning· slideslive

Taxonomy

TopicsAdaptive Dynamic Programming Control · Elevator Systems and Control · Reinforcement Learning in Robotics

MethodsQ-Learning