StreetForward: Perceiving Dynamic Street with Feedforward Causal Attention
Zhongrui Yu, Zhao Wang, Yijia Xie, Yida Wang, Xueyang Zhang, Yifei Zhan, Kun Zhan

TL;DR
StreetForward is a feedforward framework for dynamic street scene reconstruction that leverages temporal attention and 3D Gaussian Splatting to produce high-fidelity novel views and depth estimation, enabling rapid scene understanding for autonomous driving.
Contribution
It introduces a novel temporal mask attention module and a unified 3D Gaussian Splatting representation for dynamic scene reconstruction without per-scene optimization.
Findings
Outperforms existing methods on Waymo dataset for view synthesis and depth estimation.
Demonstrates strong zero-shot generalization to CARLA and other datasets.
Produces high-fidelity novel views with spatio-temporal consistency.
Abstract
Feedforward reconstruction is crucial for autonomous driving applications, where rapid scene reconstruction enables efficient utilization of large-scale driving datasets in closed-loop simulation and other downstream tasks, eliminating the need for time-consuming per-scene optimization. We present StreetForward, a pose-free and tracker-free feedforward framework for dynamic street reconstruction. Building upon the alternating attention mechanism from Visual Geometry Grounded Transformer (VGGT), we propose a simple yet effective temporal mask attention module that captures dynamic motion information from image sequences and produces motion-aware latent representations. Static content and dynamic instances are represented uniformly with 3D Gaussian Splatting, and are optimized jointly by cross-frame rendering with spatio-temporal consistency, allowing the model to infer per-pixel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
