Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

Stanislaw Szymanowicz; Eldar Insafutdinov; Chuanxia Zheng; Dylan Campbell; Jo\~ao F. Henriques; Christian Rupprecht; Andrea Vedaldi

arXiv:2406.04343·cs.CV·June 3, 2025

Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image

Stanislaw Szymanowicz, Eldar Insafutdinov, Chuanxia Zheng, Dylan Campbell, Jo\~ao F. Henriques, Christian Rupprecht, Andrea Vedaldi

PDF

Open Access 1 Repo

TL;DR

Flash3D is a fast, generalisable method for 3D scene reconstruction and novel view synthesis from a single image, leveraging a foundation depth model and Gaussian Splatting for efficient, high-quality results across diverse datasets.

Contribution

It introduces a novel feed-forward approach combining foundation depth estimation with Gaussian Splatting to achieve efficient, generalisable 3D reconstruction from a single image.

Findings

01

Achieves state-of-the-art results on RealEstate10k.

02

Outperforms competitors on unseen datasets like NYU.

03

Surpasses some multi-view methods on KITTI.

Abstract

We propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a "foundation" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eldar/flash3d
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · 3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization

MethodsBalanced Selection