Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

Siyuan Meng; Chengbo Ai

arXiv:2604.17651·cs.CV·April 21, 2026

Infrastructure-Centric World Models: Bridging Temporal Depth and Spatial Breadth for Roadside Perception

Siyuan Meng, Chengbo Ai

PDF

TL;DR

This paper proposes infrastructure-centric world models for autonomous driving, leveraging roadside sensors for long-term behavioral understanding and vehicle sensors for spatial scene sampling, aiming to enhance traffic perception and prediction.

Contribution

It introduces a novel three-phase framework for infrastructure-centric world models, integrating multi-sensor data, uncertainty propagation, and collaborative V2X communication.

Findings

01

Proposes a dual-layer architecture for perception and world modeling.

02

Defines a taxonomy of driving world model paradigms and positions I-WM within it.

03

Identifies open-source foundations for each phase of the proposed framework.

Abstract

World models, generative AI systems that simulate how environments evolve, are transforming autonomous driving, yet all existing approaches adopt an ego-vehicle perspective, leaving the infrastructure viewpoint unexplored. We argue that infrastructure-centric world models offer a fundamentally complementary capability: the bird's-eye, multi-sensor, persistent viewpoint that roadside systems uniquely possess. Central to our thesis is a spatio-temporal complementarity: fixed roadside sensors excel at temporal depth, accumulating long-term behavioral distributions including rare safety-critical events, while vehicle-borne sensors excel at spatial breadth, sampling diverse scenes across large road networks. This paper presents a vision for Infrastructure-centric World Models (I-WM) in three phases: (I) generative scene understanding with quality-aware uncertainty propagation, (II)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.