You See it, You Got it: Learning 3D Creation on Pose-Free Videos at   Scale

Baorui Ma; Huachen Gao; Haoge Deng; Zhengxiong Luo; Tiejun Huang; Lulu; Tang; Xinlong Wang

arXiv:2412.06699·cs.CV·March 24, 2025

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

Baorui Ma, Huachen Gao, Haoge Deng, Zhengxiong Luo, Tiejun Huang, Lulu, Tang, Xinlong Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

See3D is a scalable, video-based 3D generation model trained on large-scale internet videos, enabling open-world 3D creation without explicit 3D annotations, outperforming prior models on benchmarks.

Contribution

The paper introduces See3D, a novel 3D generation framework trained on large-scale web videos using a new data curation pipeline and a pose-free visual conditioning method.

Findings

01

Achieves state-of-the-art zero-shot 3D generation performance.

02

Utilizes 320 million frames from 16 million videos for training.

03

Outperforms models trained on traditional 3D datasets.

Abstract

Recent 3D generation models typically rely on limited-scale 3D `gold-labels' or 2D diffusion priors for 3D content creation. However, their performance is upper-bounded by constrained 3D priors due to the lack of scalable learning paradigms. In this work, we present See3D, a visual-conditional multi-view diffusion model trained on large-scale Internet videos for open-world 3D creation. The model aims to Get 3D knowledge by solely Seeing the visual contents from the vast and rapidly growing video data -- You See it, You Got it. To achieve this, we first scale up the training data using a proposed data curation pipeline that automatically filters out multi-view inconsistencies and insufficient observations from source videos. This results in a high-quality, richly diverse, large-scale dataset of multi-view images, termed WebVi3D, containing 320M frames from 16M video clips. Nevertheless,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

baaivision/See3D
pytorchOfficial

Models

🤗
bruiiii/See3D
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Games and Gamification

MethodsDiffusion