What's in my Room? Object Recognition on Indoor Panoramic Images
Julia Guerrero-Viu, Clara Fernandez-Labrador, C\'edric Demonceaux and, Jose J. Guerrero

TL;DR
This paper presents a deep learning-based system for object detection and segmentation in indoor panoramic images, effectively translating 2D results into 3D room models, outperforming existing methods.
Contribution
It introduces a novel adaptation of deep learning models for equirectangular images, enabling accurate object recognition and 3D bounding box generation in indoor scenes.
Findings
Outperforms state-of-the-art methods significantly
Provides accurate 3D object localization in indoor panoramas
Demonstrates comprehensive understanding of indoor objects
Abstract
In the last few years, there has been a growing interest in taking advantage of the 360 panoramic images potential, while managing the new challenges they imply. While several tasks have been improved thanks to the contextual information these images offer, object recognition in indoor scenes still remains a challenging problem that has not been deeply investigated. This paper provides an object recognition system that performs object detection and semantic segmentation tasks by using a deep learning model adapted to match the nature of equirectangular images. From these results, instance segmentation masks are recovered, refined and transformed into 3D bounding boxes that are placed into the 3D model of the room. Quantitative and qualitative results support that our method outperforms the state of the art by a large margin and show a complete understanding of the main objects in indoor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
