Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

Yaswanth Chittepu; Ativ Joshi; Rajarshi Bhattacharjee; Scott Niekum

arXiv:2603.10938·cs.LG·March 12, 2026

Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control

Yaswanth Chittepu, Ativ Joshi, Rajarshi Bhattacharjee, Scott Niekum

PDF

Open Access

TL;DR

This paper introduces a novel risk-sensitive reinforcement learning framework called RAD that uses stochastic dominance to better control tail risks and out-of-distribution failures, improving safety and robustness.

Contribution

RAD replaces expectation-based safety constraints with stochastic dominance constraints, enabling comprehensive distributional risk control in reinforcement learning.

Findings

01

RAD improves harmlessness over baselines.

02

RAD enhances robustness on out-of-distribution evaluations.

03

RAD maintains competitive helpfulness.

Abstract

Safe Reinforcement Learning from Human Feedback (RLHF) typically enforces safety through expected cost constraints, but the expectation captures only a single statistic of the cost distribution and fails to account for distributional uncertainty, particularly under heavy tails or rare catastrophic events. This limitation is problematic when robustness and risk sensitivity are critical. Stochastic dominance offers a principled alternative by comparing entire cost distributions rather than just their averages, enabling direct control over tail risks and potential out-of-distribution failures that expectation-based constraints may overlook. In this work, we propose Risk-sensitive Alignment via Dominance (RAD), a novel alignment framework that replaces scalar expected cost constraints with First-Order Stochastic Dominance (FSD) constraints. We operationalize this constraint by comparing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Smart Grid Security and Resilience