Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Mukund Varma T; Peihao Wang; Zhiwen Fan; Zhangyang Wang; Hao Su; Ravi; Ramamoorthi

arXiv:2403.18922·cs.CV·March 29, 2024·1 cites

Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi, Ramamoorthi

PDF

Open Access

TL;DR

Lift3D is a zero-shot approach that extends 2D vision models to 3D, enabling consistent multi-view predictions across various tasks without task-specific training.

Contribution

The paper introduces Lift3D, a novel method that generalizes 2D vision models to 3D, achieving zero-shot multi-view consistency for diverse vision tasks.

Findings

01

Outperforms task-specific 3D methods in several tasks

02

Works with models like DINO and CLIP without retraining

03

Enables 3D predictions for style transfer, segmentation, and more

Abstract

In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limited compared to 2D image datasets, making extending 2D vision models to 3D data highly desirable but also very challenging. Indeed, extending a single 2D vision operator like scene editing to 3D typically requires a highly creative method specialized to that task and often requires per-scene optimization. In this paper, we ask the question of whether any 2D vision model can be lifted to make 3D consistent predictions. We answer this question in the affirmative; our new Lift3D method trains to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Image Enhancement Techniques

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Softmax · Multi-Head Attention · Dense Connections · Vision Transformer · self-DIstillation with NO labels