Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning

Xiaocong Chen; Siyu Wang; Tong Yu; Lina Yao

arXiv:2403.17646·cs.LG·July 3, 2025·1 cites

Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning

Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao

PDF

Open Access

TL;DR

This paper introduces an uncertainty-aware distributional offline RL method that learns risk-averse policies by modeling the full distribution of rewards, addressing both epistemic uncertainty and environmental stochasticity for safer decision-making.

Contribution

It presents a novel model-free offline RL algorithm that captures the entire reward distribution and accounts for multiple uncertainties, enhancing risk-sensitive policy learning.

Findings

01

Superior performance in risk-sensitive benchmarks

02

Effective modeling of reward distribution

03

Addresses both epistemic uncertainty and environmental stochasticity

Abstract

Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraffic control and management · Reinforcement Learning in Robotics