AutoScape: Geometry-Consistent Long-Horizon Scene Generation

Jiacheng Chen; Ziyu Jiang; Mingfu Liang; Bingbing Zhuang; Jong-Chyi Su; Sparsh Garg; Ying Wu; Manmohan Chandraker

arXiv:2510.20726·cs.CV·October 24, 2025

AutoScape: Geometry-Consistent Long-Horizon Scene Generation

Jiacheng Chen, Ziyu Jiang, Mingfu Liang, Bingbing Zhuang, Jong-Chyi Su, Sparsh Garg, Ying Wu, Manmohan Chandraker

PDF

Open Access

TL;DR

AutoScape introduces a novel RGB-D diffusion framework for long-horizon driving scene generation, ensuring geometric consistency and high-quality video synthesis, significantly outperforming previous methods in FID and FVD metrics.

Contribution

The paper presents a new diffusion-based approach that jointly models image and depth for consistent long-horizon scene generation, with explicit geometry conditioning and warp-guided sampling.

Findings

01

Generates over 20 seconds of realistic driving videos.

02

Improves long-horizon FID by 48.6%.

03

Enhances FVD scores by 43.0%.

Abstract

This paper proposes AutoScape, a long-horizon driving scene generation framework. At its core is a novel RGB-D diffusion model that iteratively generates sparse, geometrically consistent keyframes, serving as reliable anchors for the scene's appearance and geometry. To maintain long-range geometric consistency, the model 1) jointly handles image and depth in a shared latent space, 2) explicitly conditions on the existing scene geometry (i.e., rendered point clouds) from previously generated keyframes, and 3) steers the sampling process with a warp-consistent guidance. Given high-quality RGB-D keyframes, a video diffusion model then interpolates between them to produce dense and coherent video frames. AutoScape generates realistic and geometrically consistent driving videos of over 20 seconds, improving the long-horizon FID and FVD scores over the prior state-of-the-art by 48.6\% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging