Joint 2D-3D-Semantic Data for Indoor Scene Understanding
Iro Armeni, Sasha Sax, Amir R. Zamir, Silvio Savarese

TL;DR
This paper introduces a comprehensive large-scale indoor dataset with synchronized 2D, 2.5D, and 3D data, including semantic annotations, to facilitate advanced indoor scene understanding and cross-modal learning.
Contribution
The paper provides a new large-scale, richly annotated indoor dataset with multiple modalities, enabling joint and cross-modal learning approaches for scene understanding.
Findings
Dataset covers over 6,000m2 of indoor space.
Contains over 70,000 RGB images with depth, normals, and semantic labels.
Includes registered 3D meshes and point clouds.
Abstract
We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. The dataset covers over 6,000m2 and contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360{\deg} equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces. The dataset is available here: http://3Dsemantics.stanford.edu/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Robotics and Sensor-Based Localization · Remote Sensing and LiDAR Applications
