Variational OOD State Correction for Offline Reinforcement Learning

Ke Jiang; Wen Jiang; Xiaoyang Tan

arXiv:2505.00503·cs.LG·July 9, 2025

Variational OOD State Correction for Offline Reinforcement Learning

Ke Jiang, Wen Jiang, Xiaoyang Tan

PDF

1 Video

TL;DR

This paper introduces DASP, a variational method for OOD state correction in offline reinforcement learning, which promotes safe actions by considering data density and outcomes, validated on MuJoCo and AntMaze benchmarks.

Contribution

The paper presents a novel variational approach called DASP that integrates data density and outcome considerations for improved OOD state correction in offline RL.

Findings

01

DASP effectively reduces OOD issues in offline RL.

02

Experimental results show improved performance on MuJoCo and AntMaze.

03

The method enhances safety and robustness in decision-making.

Abstract

The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variational OOD State Correction for Offline Reinforcement Learning· underline