Beyond Scalar Rewards: Distributional Reinforcement Learning with Preordered Objectives for Safe and Reliable Autonomous Driving
Ahmed Abouelazm, Jonas Michel, Daniel Bogdoll, Philip Sch\"orner, and J. Marius Z\"ollner

TL;DR
This paper introduces a hierarchical multi-objective reinforcement learning framework for autonomous driving, using distributional RL with a novel comparison metric to prioritize safety and efficiency without collapsing objectives into a scalar.
Contribution
It proposes the Preordered Multi-Objective MDP and Quantile Dominance metric, enabling hierarchical decision-making and safer policies in autonomous driving.
Findings
Improved success rates in Carla simulations
Fewer collisions and off-road events
More robust policies compared to baselines
Abstract
Autonomous driving involves multiple, often conflicting objectives such as safety, efficiency, and comfort. In reinforcement learning (RL), these objectives are typically combined through weighted summation, which collapses their relative priorities and often yields policies that violate safety-critical constraints. To overcome this limitation, we introduce the Preordered Multi-Objective MDP (Pr-MOMDP), which augments standard MOMDPs with a preorder over reward components. This structure enables reasoning about actions with respect to a hierarchy of objectives rather than a scalar signal. To make this structure actionable, we extend distributional RL with a novel pairwise comparison metric, Quantile Dominance (QD), that evaluates action return distributions without reducing them into a single statistic. Building on QD, we propose an algorithm for extracting optimal subsets, the subset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Autonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning
