Exclusively Penalized Q-learning for Offline Reinforcement Learning
Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han

TL;DR
This paper introduces Exclusively Penalized Q-learning (EPQ), a novel offline RL method that selectively penalizes states to reduce estimation bias and enhance performance in control tasks.
Contribution
EPQ is the first approach to selectively penalize states, addressing underestimation bias in offline RL and outperforming existing methods.
Findings
EPQ significantly reduces underestimation bias.
EPQ improves performance in offline control tasks.
EPQ outperforms other offline RL methods.
Abstract
Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdaptive Dynamic Programming Control · Elevator Systems and Control · Reinforcement Learning in Robotics
MethodsQ-Learning
