TL;DR
This paper introduces DASP, a variational method for OOD state correction in offline reinforcement learning, which promotes safe actions by considering data density and outcomes, validated on MuJoCo and AntMaze benchmarks.
Contribution
The paper presents a novel variational approach called DASP that integrates data density and outcome considerations for improved OOD state correction in offline RL.
Findings
DASP effectively reduces OOD issues in offline RL.
Experimental results show improved performance on MuJoCo and AntMaze.
The method enhances safety and robustness in decision-making.
Abstract
The performance of Offline reinforcement learning is significantly impacted by the issue of state distributional shift, and out-of-distribution (OOD) state correction is a popular approach to address this problem. In this paper, we propose a novel method named Density-Aware Safety Perception (DASP) for OOD state correction. Specifically, our method encourages the agent to prioritize actions that lead to outcomes with higher data density, thereby promoting its operation within or the return to in-distribution (safe) regions. To achieve this, we optimize the objective within a variational framework that concurrently considers both the potential outcomes of decision-making and their density, thus providing crucial contextual information for safe decision-making. Finally, we validate the effectiveness and feasibility of our proposed method through extensive experimental evaluations on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
