Diffusion Policies for Risk-Averse Behavior Modeling in Offline Reinforcement Learning
Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao

TL;DR
This paper introduces an uncertainty-aware distributional offline RL method that learns risk-averse policies by modeling the full distribution of rewards, addressing both epistemic uncertainty and environmental stochasticity for safer decision-making.
Contribution
It presents a novel model-free offline RL algorithm that captures the entire reward distribution and accounts for multiple uncertainties, enhancing risk-sensitive policy learning.
Findings
Superior performance in risk-sensitive benchmarks
Effective modeling of reward distribution
Addresses both epistemic uncertainty and environmental stochasticity
Abstract
Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic control and management · Reinforcement Learning in Robotics
