Driving Intents Amplify Planning-Oriented Reinforcement Learning

Hengtong Lu; Victor Shea-Jay Huang; Chengmin Yang; Pengfei Jing; Jifeng Dai; Yan Xie; and Benjin Zhu

arXiv:2605.12625·cs.RO·May 15, 2026

Driving Intents Amplify Planning-Oriented Reinforcement Learning

Hengtong Lu, Victor Shea-Jay Huang, Chengmin Yang, Pengfei Jing, Jifeng Dai, Yan Xie, and Benjin Zhu

PDF

TL;DR

This paper introduces DIAL, a two-stage reinforcement learning framework that expands and preserves the sampling distribution of continuous-action driving policies using intent conditioning and multi-intent preference RL, leading to improved performance.

Contribution

DIAL's novel approach combines intent-conditioned flow-matching and multi-intent preference RL to overcome mode collapse in continuous-action driving policies trained from limited demonstrations.

Findings

01

Intent conditioning lifts performance ceiling beyond previous bests.

02

Multi-intent preference RL improves generalization and robustness.

03

Expanded sampling distribution enhances policy diversity and effectiveness.

Abstract

Continuous-action policies trained on a single demonstrated trajectory per scene suffer from mode collapse: samples cluster around the demonstrated maneuver and the policy cannot represent semantically distinct alternatives. Under preference-based evaluation, this caps best-of-N performance -- even oracle selection cannot recover what the sampling distribution does not contain. We introduce DIAL, a two-stage Driving-Intent-Amplified reinforcement Learning framework for preference-aligned continuous-action driving policies. In the first stage, DIAL conditions the flow-matching action head on a discrete intent label with classifier-free guidance (CFG), which expands the sampling distribution along distinct maneuver modes and breaks single-demonstration mode collapse. In the second stage, DIAL carries this expanded distribution into preference RL through multi-intent GRPO, which spans all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.