ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation

Lili Gao; Yanbo Xu; William Koch; Samuele Ruffino; Luke Rowe; Behdad Chalaki; Dmitriy Rivkin; Julian Ost; Roger Girgis; Mario Bijelic; Felix Heide

arXiv:2604.17147·cs.CV·April 21, 2026

ScenarioControl: Vision-Language Controllable Vectorized Latent Scenario Generation

Lili Gao, Yanbo Xu, William Koch, Samuele Ruffino, Luke Rowe, Behdad Chalaki, Dmitriy Rivkin, Julian Ost, Roger Girgis, Mario Bijelic, Felix Heide

PDF

1 Repo

TL;DR

ScenarioControl is a novel vision-language system for generating diverse, realistic 3D driving scenarios with fine-grained control over layout and traffic, supporting long-term, multi-view simulations.

Contribution

It introduces the first control mechanism for learned driving scenario generation that integrates multimodal inputs with a vectorized latent space.

Findings

01

Produces temporally consistent 3D scenarios from different viewpoints.

02

Achieves high control fidelity and realism compared to existing methods.

03

Supports long-horizon scenario continuation.

Abstract

We introduce ScenarioControl, the first vision-language control mechanism for learned driving scenario generation. Given a text prompt or an input image, Scenario-Control synthesizes diverse, realistic 3D scenario rollouts - including map, 3D boxes of reactive actors over time, pedestrians, driving infrastructure, and ego camera observations. The method generates scenes in a vectorized latent space that represents road structure and dynamic agents jointly. To connect multimodal control with sparse vectorized scene elements, we propose a cross-global control mechanism that integrates crossattention with a lightweight global-context branch, enabling fine-grained control over road layout and traffic conditions while preserving realism. The method produces temporally consistent scenario rollouts from the perspectives different actors in the scene, supporting long-horizon continuation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://light.princeton.edu/ScenarioControl
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.