Driving Intents Amplify Planning-Oriented Reinforcement Learning
Hengtong Lu, Victor Shea-Jay Huang, Chengmin Yang, Pengfei Jing, Jifeng Dai, Yan Xie, and Benjin Zhu

TL;DR
This paper introduces DIAL, a two-stage reinforcement learning framework that expands and preserves the sampling distribution of continuous-action driving policies using intent conditioning and multi-intent preference RL, leading to improved performance.
Contribution
DIAL's novel approach combines intent-conditioned flow-matching and multi-intent preference RL to overcome mode collapse in continuous-action driving policies trained from limited demonstrations.
Findings
Intent conditioning lifts performance ceiling beyond previous bests.
Multi-intent preference RL improves generalization and robustness.
Expanded sampling distribution enhances policy diversity and effectiveness.
Abstract
Continuous-action policies trained on a single demonstrated trajectory per scene suffer from mode collapse: samples cluster around the demonstrated maneuver and the policy cannot represent semantically distinct alternatives. Under preference-based evaluation, this caps best-of-N performance -- even oracle selection cannot recover what the sampling distribution does not contain. We introduce DIAL, a two-stage Driving-Intent-Amplified reinforcement Learning framework for preference-aligned continuous-action driving policies. In the first stage, DIAL conditions the flow-matching action head on a discrete intent label with classifier-free guidance (CFG), which expands the sampling distribution along distinct maneuver modes and breaks single-demonstration mode collapse. In the second stage, DIAL carries this expanded distribution into preference RL through multi-intent GRPO, which spans all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
