DriveFix: Spatio-Temporally Coherent Driving Scene Restoration

Heyu Si; Brandon James Denis; Muyang Sun; Dragos Datcu; Yaoru Li; Xin Jin; Ruiju Fu; Yuliia Tatarinova; Federico Landi; Jie Song; Mingli Song; and Qi Guo

arXiv:2603.16306·cs.CV·March 18, 2026

DriveFix: Spatio-Temporally Coherent Driving Scene Restoration

Heyu Si, Brandon James Denis, Muyang Sun, Dragos Datcu, Yaoru Li, Xin Jin, Ruiju Fu, Yuliia Tatarinova, Federico Landi, Jie Song, Mingli Song, and Qi Guo

PDF

Open Access

TL;DR

DriveFix introduces a spatio-temporally coherent multi-view restoration framework for driving scenes, leveraging diffusion transformers and geometry-aware training to improve 4D scene reconstruction and view synthesis in autonomous driving.

Contribution

It presents a novel interleaved diffusion transformer architecture that explicitly models temporal and spatial dependencies for coherent scene restoration.

Findings

01

Achieves state-of-the-art results on Waymo, nuScenes, and PandaSet datasets.

02

Reduces artifacts and improves texture consistency across views and time.

03

Enforces 3D geometric consistency in scene reconstruction.

Abstract

Recent advancements in 4D scene reconstruction, particularly those leveraging diffusion priors, have shown promise for novel view synthesis in autonomous driving. However, these methods often process frames independently or in a view-by-view manner, leading to a critical lack of spatio-temporal synergy. This results in spatial misalignment across cameras and temporal drift in sequences. We propose DriveFix, a novel multi-view restoration framework that ensures spatio-temporal coherence for driving scenes. Our approach employs an interleaved diffusion transformer architecture with specialized blocks to explicitly model both temporal dependencies and cross-camera spatial consistency. By conditioning the generation on historical context and integrating geometry-aware training losses, DriveFix enforces that the restored views adhere to a unified 3D geometry. This enables the consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques