LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies

Ximan Sun; Xiang Cheng

arXiv:2510.24983·cs.LG·February 20, 2026

LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies

Ximan Sun, Xiang Cheng

PDF

TL;DR

LRT-Diffusion introduces a risk-aware, statistically calibrated guidance method for diffusion policies in offline reinforcement learning, improving out-of-distribution performance while maintaining simplicity and interpretability.

Contribution

It proposes a novel hypothesis testing-based guidance mechanism that calibrates risk at inference time, compatible with standard diffusion training, and enhances OOD robustness in offline RL.

Findings

01

Improves return-OOD trade-off on MuJoCo tasks

02

Provides theoretical guarantees for calibration and stability

03

Demonstrates effectiveness over Q-guided baselines

Abstract

Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.