Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

Jiaxin Huang; Yuanbo Yang; Bangbang Yang; Lin Ma; Yuewen Ma; Yiyi Liao

arXiv:2601.04090·cs.CV·March 24, 2026

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, Yiyi Liao

PDF

Open Access 1 Models

TL;DR

Gen3R introduces a novel approach that combines reconstruction models and video diffusion models to generate detailed 3D scenes from images, achieving state-of-the-art results and improving robustness.

Contribution

The paper presents a method that aligns geometric and appearance latents from reconstruction and diffusion models for improved 3D scene generation.

Findings

01

Achieves state-of-the-art results in 3D scene generation from images.

02

Produces both RGB videos and 3D geometry including camera poses and depth maps.

03

Enhances reconstruction robustness using generative priors.

Abstract

We present Gen3R, a method that bridges the strong priors of foundational reconstruction models and video diffusion models for scene-level 3D generation. We repurpose the VGGT reconstruction model to produce geometric latents by training an adapter on its tokens, which are regularized to align with the appearance latents of pre-trained video diffusion models. By jointly generating these disentangled yet aligned latents, Gen3R produces both RGB videos and corresponding 3D geometry, including camera poses, depth maps, and global point clouds. Experiments demonstrate that our approach achieves state-of-the-art results in single- and multi-image conditioned 3D scene generation. Additionally, our method can enhance the robustness of reconstruction by leveraging generative priors, demonstrating the mutual benefit of tightly coupling reconstruction and generative models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
JaceyH919/Gen3R
model· 66 dl· ♡ 2
66 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · 3D Shape Modeling and Analysis