SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction

Zhenghao Peng; Yuxin Liu; Bolei Zhou

arXiv:2506.23316·cs.RO·March 4, 2026

SceneStreamer: Continuous Scenario Generation as Next Token Group Prediction

Zhenghao Peng, Yuxin Liu, Bolei Zhou

PDF

Open Access 3 Reviews

TL;DR

SceneStreamer is a transformer-based framework that generates continuous, realistic traffic scenarios for autonomous driving simulation, supporting long-duration, dynamic agent interactions and improving policy robustness.

Contribution

It introduces a novel autoregressive token-based approach for long-horizon traffic scenario generation, enabling dynamic agent management and realistic behavior modeling.

Findings

01

Produces diverse, realistic traffic scenarios

02

Enhances robustness of autonomous driving policies trained in simulation

03

Supports unbounded, long-duration scenario generation

Abstract

Realistic and interactive traffic simulation is essential for training and evaluating autonomous driving systems. However, most existing data-driven simulation methods rely on static initialization or log-replay data, limiting their ability to model dynamic, long-horizon scenarios with evolving agent populations. We propose SceneStreamer, a unified autoregressive framework for continuous scenario generation that represents the entire scene as a sequence of tokens, including traffic light signals, agent states, and motion vectors, and generates them step by step with a transformer model. This design enables SceneStreamer to continuously introduce and retire agents over an unbounded horizon, supporting realistic long-duration simulation. Experiments demonstrate that SceneStreamer produces realistic, diverse, and adaptive traffic behaviors. Furthermore, reinforcement learning policies…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

*Originality: The core idea of framing multi-agent, dynamic scenario generation as a unified next-token prediction task using a single autoregressive model (InfGen) is highly original in this domain. Specifically, the autoregressive generation of agent states (Type, Map ID, Relative State tokens: ⟨SOA,TYPE,MS,RS⟩_t) anchored to map segments is a clever mechanism for achieving physically and semantically consistent agent initialization, which is a major advancement over prior non-causal "flat" de

Weaknesses

1. Limited Motion Prediction Benchmarking: While the core focus is scenario generation, comparing InfGen's motion prediction performance only against its own ablated version (InfGen-Motion vs. InfGen-Full) is insufficient. The motion prediction task (Sec 3.2) is standard, and performance should be compared against state-of-the-art motion prediction baselines on the Waymo Open Motion Dataset (WOMD) to properly contextualize the model's trajectory-modeling capability. 2. Lack of Diversity Metrics

Reviewer 02Rating 4Confidence 4

Strengths

● Unified formulation: Modeling the entire scene as a next-token sequence provides a unified autoregressive framework capturing spatiotemporal dependencies among maps, lights, and agents. This “traffic as language” design enhances long-horizon consistency, supports multiple tasks, and enables seamless dynamic scene evolution. ● Dynamic agent injection: The model can add or remove agents at different timesteps, breaking from the fixed-agent assumption and better reflecting open-world traffic wher

Weaknesses

● Limited performance on core WOMD metrics: Despite its novel formulation, InfGen does not achieve competitive results on the core WOMD leaderboard—particularly on mADE, which is the primary metric of the Waymo Challenge. Its overall scores lag behind recent strong baselines such as UniMM and CAT-K, raising concerns about whether the proposed architectural contributions and dynamic scenario generation truly translate into better motion accuracy or downstream utility. The claimed advantage in sup

Reviewer 03Rating 4Confidence 3

Strengths

1. Closed-loop simulation 2. Unified modeling of the whole scenario

Weaknesses

1. Confusing positioned contribution and experimental setting 2. Lack of comprehensive comparison

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Traffic control and management · Traffic Prediction and Management Techniques