Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies

Feeza Khan Khanzada; Jaerock Kwon

arXiv:2512.04279·cs.RO·December 30, 2025

Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies

Feeza Khan Khanzada, Jaerock Kwon

PDF

Open Access

TL;DR

This paper introduces a two-stage distillation framework where a dense-reward-trained world model guides a sparse-reward policy for autonomous driving, resulting in better generalization and safety on unseen routes.

Contribution

The authors propose reward-privileged world model distillation, enabling the transfer of dense reward knowledge into sparse-reward policies without action imitation, improving generalization in autonomous driving.

Findings

01

Sparse-reward students outperform dense-reward teachers in CARLA benchmarks.

02

Distillation improves success rates by 23% on unseen routes.

03

Students achieve up to 27x success improvement in overtaking on new routes.

Abstract

We study how to exploit dense simulator-defined rewards in vision-based autonomous driving without inheriting their misalignment with deployment metrics. In realistic simulators such as CARLA, privileged state (e.g., lane geometry, infractions, time-to-collision) can be converted into dense rewards that stabilize and accelerate model-based reinforcement learning, but policies trained directly on these signals often overfit and fail to generalize when evaluated on sparse objectives such as route completion and collision-free overtaking. We propose reward-privileged world model distillation, a two-stage framework in which a teacher DreamerV3-style agent is first trained with a dense privileged reward, and only its latent dynamics are distilled into a student trained solely on sparse task rewards. Teacher and student share the same observation space (semantic bird's-eye-view images);…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics