Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Linjie Xu; Zhengyao Jiang; Jinyu Wang; Lei Song; Jiang Bian

arXiv:2306.03680·cs.LG·June 18, 2024·1 cites

Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

Linjie Xu, Zhengyao Jiang, Jinyu Wang, Lei Song, Jiang Bian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Mildly Constrained Evaluation Policy (MCEP) for offline RL that uses a less restrictive policy during test time, improving performance across various benchmarks and integrating seamlessly with existing algorithms.

Contribution

The paper proposes MCEP, a novel approach that adjusts constraints for test time inference in offline RL, enhancing performance and compatibility with prior methods.

Findings

01

MCEP significantly improves performance on D4RL benchmarks.

02

MCEP can be integrated with existing offline RL algorithms as a plug-in.

03

Empirical results show MCEP enhances state-of-the-art methods.

Abstract

Offline reinforcement learning (RL) methodologies enforce constraints on the policy to adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions during test time. Conventional approaches apply identical constraints for both value learning and test time inference. However, our findings indicate that the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a \textit{Mildly Constrained Evaluation Policy (MCEP)} for test time inference with a more constrained \textit{target policy} for value estimation. Since the \textit{target policy} has been adopted in various prior approaches, MCEP can be seamlessly integrated with them as a plug-in. We instantiate MCEP based on TD3BC (Fujimoto & Gu, 2021), AWAC (Nair et…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

egg-west/mcep
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBehavioral and Psychological Studies · Reinforcement Learning in Robotics

MethodsTest