A Deep Learning-based Global and Segmentation-based Semantic Feature Fusion Approach for Indoor Scene Classification
Ricardo Pereira, Tiago Barros, Luis Garrote, Ana Lopes, Urbano J., Nunes

TL;DR
This paper introduces a deep learning method that combines semantic segmentation and global features for indoor scene classification, achieving state-of-the-art accuracy on benchmark datasets.
Contribution
It presents a novel fusion approach using semantic segmentation masks and a two-branch CNN architecture for improved indoor scene classification.
Findings
Achieved state-of-the-art results on SUN RGB-D dataset.
Outperformed existing methods on NYU Depth V2 dataset.
Demonstrated effectiveness of semantic feature fusion in scene classification.
Abstract
This work proposes a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the segmentation-categories across the scene, designated by segmentation-based semantic features (SSFs). These features represent, per segmentation-category, the pixel count, as well as the 2D average position and respective standard deviation values. Moreover, a two-branch network, GS2F2App, that exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art results on both datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Remote Sensing and LiDAR Applications · Advanced Image and Video Retrieval Techniques
