CDSA: Conservative Denoising Score-based Algorithm for Offline   Reinforcement Learning

Zeyuan Liu; Kai Yang; Xiu Li

arXiv:2406.07541·cs.LG·June 12, 2024

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

Zeyuan Liu, Kai Yang, Xiu Li

PDF

Open Access

TL;DR

This paper introduces CDSA, a novel offline reinforcement learning algorithm that leverages denoising score models to improve policy generalization and risk aversion by adjusting actions based on dataset density gradients.

Contribution

We propose CDSA, which decouples conservatism from policy learning and uses score-based models for more accurate action adjustment in offline RL.

Findings

01

Significantly improves baseline performance on D4RL datasets.

02

Demonstrates strong generalizability across different tasks.

03

Increases risk aversion of the learned policies.

Abstract

Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions. We decouple the conservatism constraints from the policy, thus can benefit wide offline RL algorithms. As a consequence, we propose the Conservative Denoising Score-based Algorithm (CDSA) which utilizes the denoising score-based model to model the gradient of the dataset density, rather than the dataset density itself, and facilitates a more accurate and efficient method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control · Reinforcement Learning in Robotics