Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation

Junlin Han; Jianyuan Wang; Andrea Vedaldi; Philip Torr; Filippos Kokkinos

arXiv:2410.00890·cs.CV·June 3, 2025

Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation

Junlin Han, Jianyuan Wang, Andrea Vedaldi, Philip Torr, Filippos Kokkinos

PDF

Open Access

TL;DR

Flex3D introduces a flexible, two-stage 3D generation framework that utilizes high-quality input views and a transformer-based reconstruction model to produce superior 3D content from limited or sparse inputs.

Contribution

The paper presents Flex3D, a novel framework that allows arbitrary input views and employs a transformer-based reconstruction model for improved 3D generation.

Findings

01

Achieves state-of-the-art 3D generation performance.

02

User study over 92% winning rate in 3D tasks.

03

Effective view curation enhances reconstruction quality.

Abstract

Generating high-quality 3D content from text, single images, or sparse view images remains a challenging task with broad applications. Existing methods typically employ multi-view diffusion models to synthesize multi-view images, followed by a feed-forward process for 3D reconstruction. However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. The first stage consists of a candidate view generation and curation pipeline. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Robotics and Sensor-Based Localization

MethodsDiffusion