StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

Zhongrui Yu; Zhao Wang; Yijia Xie; Yida Wang; Xueyang Zhang; Yifei Zhan; Kun Zhan

arXiv:2603.19552·cs.CV·March 23, 2026

StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention

Zhongrui Yu, Zhao Wang, Yijia Xie, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan

PDF

Open Access

TL;DR

StreetForward is a feedforward framework for dynamic street scene reconstruction that leverages temporal attention and 3D Gaussian Splatting to produce high-fidelity novel views and depth estimation, enabling rapid scene understanding for autonomous driving.

Contribution

It introduces a novel temporal mask attention module and a unified 3D Gaussian Splatting representation for dynamic scene reconstruction without per-scene optimization.

Findings

01

Outperforms existing methods on Waymo dataset for view synthesis and depth estimation.

02

Demonstrates strong zero-shot generalization to CARLA and other datasets.

03

Produces high-fidelity novel views with spatio-temporal consistency.

Abstract

Feedforward reconstruction is crucial for autonomous driving applications, where rapid scene reconstruction enables efficient utilization of large-scale driving datasets in closed-loop simulation and other downstream tasks, eliminating the need for time-consuming per-scene optimization. We present StreetForward, a pose-free and tracker-free feedforward framework for dynamic street reconstruction. Building upon the alternating attention mechanism from Visual Geometry Grounded Transformer (VGGT), we propose a simple yet effective temporal mask attention module that captures dynamic motion information from image sequences and produces motion-aware latent representations. Static content and dynamic instances are represented uniformly with 3D Gaussian Splatting, and are optimized jointly by cross-frame rendering with spatio-temporal consistency, allowing the model to infer per-pixel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications