Action Emergence from Streaming Intent
Pengfei Jing, Victor Shea-Jay Huang, Hengtong Lu, Jifeng Dai, Yan Xie, Benjin Zhu

TL;DR
This paper introduces Streaming Intent, a novel approach for autonomous driving that generates safe, semantically appropriate actions through scene-conditioned reasoning, demonstrating controllability and competitive performance.
Contribution
The paper proposes Streaming Intent, a new mechanism enabling action emergence in autonomous driving by causally deriving and streaming driving intent across scenes and time.
Findings
Achieves a competitive RFS score of 7.96 on Waymo benchmark.
Demonstrates intent-faithful controllability with high-quality, distinct plans.
First fully end-to-end VLA model showing controllable, data-driven intent expression.
Abstract
We formalize action emergence as a target capability for end-to-end autonomous driving: the ability to generate physically feasible, semantically appropriate, and safety-compliant actions in arbitrary, long-tail traffic scenes through scene-conditioned reasoning rather than retrieval or interpolation of learned scene-action mappings. We show that previous paradigms cannot deliver action emergence: autoregressive trajectory decoders collapse the inherently multimodal future into a single averaged output, while diffusion and flow-matching generators express multimodality but are not steerable by reasoned intent. We propose Streaming Intent as a concrete way to approach action emergence: a mechanism that makes driving intent (i) semantically streamed through a continuous chain-of-thought that causally derives the intent from scene understanding, and (ii) temporally streamed across clips so…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
