Bridging the Dimensionality Gap: A Taxonomy and Survey of 2D Vision Model Adaptation for 3D Analysis
Akshat Pandya, Bhavuk Jain

TL;DR
This survey categorizes and analyzes various strategies for adapting 2D vision models to 3D data, addressing the core challenge of bridging the gap between regular 2D images and irregular 3D structures.
Contribution
It provides a unified taxonomy of 3D adaptation methods, analyzing their trade-offs and outlining future research directions in 3D vision model development.
Findings
Data-centric methods project 3D data into 2D formats for model reuse.
Architecture-centric methods design specialized 3D networks.
Hybrid methods combine 2D and 3D modeling paradigms.
Abstract
The remarkable success of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in 2D vision has spurred significant research in extending these architectures to the complex domain of 3D analysis. Yet, a core challenge arises from a fundamental dichotomy between the regular, dense grids of 2D images and the irregular, sparse nature of 3D data such as point clouds and meshes. This survey provides a comprehensive review and a unified taxonomy of adaptation strategies that bridge this gap, classifying them into three families: (1) Data-centric methods that project 3D data into 2D formats to leverage off-the-shelf 2D models, (2) Architecture-centric methods that design intrinsic 3D networks, and (3) Hybrid methods, which synergistically combine the two modeling paradigms to benefit from both rich visual priors of large 2D datasets and explicit geometric reasoning of 3D models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
