LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies
Ximan Sun, Xiang Cheng

TL;DR
LRT-Diffusion introduces a risk-aware, statistically calibrated guidance method for diffusion policies in offline reinforcement learning, improving out-of-distribution performance while maintaining simplicity and interpretability.
Contribution
It proposes a novel hypothesis testing-based guidance mechanism that calibrates risk at inference time, compatible with standard diffusion training, and enhances OOD robustness in offline RL.
Findings
Improves return-OOD trade-off on MuJoCo tasks
Provides theoretical guarantees for calibration and stability
Demonstrates effectiveness over Q-guided baselines
Abstract
Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Diffusion, a risk-aware sampling rule that treats each denoising step as a sequential hypothesis test between the unconditional prior and the state-conditional policy head. Concretely, we accumulate a log-likelihood ratio and gate the conditional mean with a logistic controller whose threshold tau is calibrated once under H0 to meet a user-specified Type-I level alpha. This turns guidance from a fixed push into an evidence-driven adjustment with a user-interpretable risk budget. Importantly, we deliberately leave training vanilla (two heads with standard epsilon-prediction) under the structure of DDPM. LRT guidance composes naturally with Q-gradients: critic-gradient updates can be taken at the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
