ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
Wufei Ma, Guanning Zeng, Guofeng Zhang, Qihao Liu, Letian Zhang, Adam, Kortylewski, Yaoyao Liu, Alan Yuille

TL;DR
This paper introduces ImageNet3D, a comprehensive dataset with 3D annotations for 200 categories, enabling the development and analysis of models capable of understanding 3D object information from 2D images.
Contribution
The creation of ImageNet3D dataset with detailed 3D annotations for diverse categories, facilitating research on general-purpose 3D understanding and reasoning in vision models.
Findings
Models trained on ImageNet3D show improved 3D awareness.
The dataset enables new tasks like open vocabulary pose estimation.
Experimental results demonstrate enhanced 3D understanding capabilities.
Abstract
A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This is a challenging task, as it involves inferring 3D information from 2D signals and most importantly, generalizing to rigid objects from unseen categories. However, existing datasets with object-level 3D annotations are often limited by the number of categories or the quality of annotations. Models developed on these datasets become specialists for certain categories or domains, and fail to generalize. In this work, we present ImageNet3D, a large dataset for general-purpose object-level 3D understanding. ImageNet3D augments 200 categories from the ImageNet dataset with 2D bounding box, 3D pose, 3D location annotations, and image captions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · 3D Surveying and Cultural Heritage · Advanced Image and Video Retrieval Techniques
