Mildly Constrained Evaluation Policy for Offline Reinforcement Learning
Linjie Xu, Zhengyao Jiang, Jinyu Wang, Lei Song, Jiang Bian

TL;DR
This paper introduces a Mildly Constrained Evaluation Policy (MCEP) for offline RL that uses a less restrictive policy during test time, improving performance across various benchmarks and integrating seamlessly with existing algorithms.
Contribution
The paper proposes MCEP, a novel approach that adjusts constraints for test time inference in offline RL, enhancing performance and compatibility with prior methods.
Findings
MCEP significantly improves performance on D4RL benchmarks.
MCEP can be integrated with existing offline RL algorithms as a plug-in.
Empirical results show MCEP enhances state-of-the-art methods.
Abstract
Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a \textit{Mildly Constrained Evaluation Policy (MCEP)} for test time inference with a more constrained \textit{target policy} for value estimation. Since the \textit{target policy} has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3BC (Fujimoto & Gu, 2021), AWAC (Nair et…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBehavioral and Psychological Studies · Reinforcement Learning in Robotics
MethodsTest
