Fighting Uncertainty with Gradients: Offline Reinforcement Learning via   Diffusion Score Matching

H.J. Terry Suh; Glen Chou; Hongkai Dai; Lujie Yang; Abhishek Gupta,; Russ Tedrake

arXiv:2306.14079·cs.LG·October 18, 2023·1 cites

Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching

H.J. Terry Suh, Glen Chou, Hongkai Dai, Lujie Yang, Abhishek Gupta,, Russ Tedrake

PDF

Open Access

TL;DR

This paper introduces Score-Guided Planning (SGP), a novel offline RL method that leverages learned gradient scores via diffusion score matching to efficiently navigate high-dimensional spaces and reduce model bias.

Contribution

It proposes a new approach to estimate data uncertainty through gradients learned with score matching, enabling scalable first-order planning in offline RL.

Findings

01

SGP outperforms zeroth-order methods in high-dimensional tasks.

02

Score-matching gradients facilitate efficient uncertainty minimization.

03

The approach effectively reduces model bias and local minima issues.

Abstract

Gradient-based methods enable efficient search capabilities in high dimensions. However, in order to apply them effectively in offline optimization paradigms such as offline Reinforcement Learning (RL) or Imitation Learning (IL), we require a more careful consideration of how uncertainty estimation interplays with first-order methods that attempt to minimize them. We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties: (i) it allows gradient-based methods that attempt to minimize uncertainty to drive iterates to data as smoothing is annealed, and (ii) it facilitates analysis of model bias with Lipschitz constants. As distance to data can be expensive to compute online, we consider settings where we need amortize this computation. Instead of learning the distance however, we propose to learn its gradients directly as an oracle for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Data Classification · Reinforcement Learning in Robotics