CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning
Zeyuan Liu, Kai Yang, Xiu Li

TL;DR
This paper introduces CDSA, a novel offline reinforcement learning algorithm that leverages denoising score models to improve policy generalization and risk aversion by adjusting actions based on dataset density gradients.
Contribution
We propose CDSA, which decouples conservatism from policy learning and uses score-based models for more accurate action adjustment in offline RL.
Findings
Significantly improves baseline performance on D4RL datasets.
Demonstrates strong generalizability across different tasks.
Increases risk aversion of the learned policies.
Abstract
Distribution shift is a major obstacle in offline reinforcement learning, which necessitates minimizing the discrepancy between the learned policy and the behavior policy to avoid overestimating rare or unseen actions. Previous conservative offline RL algorithms struggle to generalize to unseen actions, despite their success in learning good in-distribution policy. In contrast, we propose to use the gradient fields of the dataset density generated from a pre-trained offline RL algorithm to adjust the original actions. We decouple the conservatism constraints from the policy, thus can benefit wide offline RL algorithms. As a consequence, we propose the Conservative Denoising Score-based Algorithm (CDSA) which utilizes the denoising score-based model to model the gradient of the dataset density, rather than the dataset density itself, and facilitates a more accurate and efficient method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElevator Systems and Control · Reinforcement Learning in Robotics
