FOSNet: An End-to-End Trainable Deep Neural Network for Scene Recognition
Hongje Seong, Junhyuk Hyun, Euntai Kim

TL;DR
FOSNet is a deep neural network that combines object and scene information with a novel scene coherence loss to improve scene recognition accuracy across multiple datasets.
Contribution
The paper introduces FOSNet, a CNN framework that fuses object and scene features and employs a new scene coherence loss for enhanced recognition performance.
Findings
Achieved state-of-the-art accuracy on Places 2 and MIT indoor datasets.
Outperformed previous methods on SUN 397 dataset.
Demonstrated the effectiveness of scene coherence loss in scene recognition.
Abstract
Scene recognition is an image recognition problem aimed at predicting the category of the place at which the image is taken. In this paper, a new scene recognition method using the convolutional neural network (CNN) is proposed. The proposed method is based on the fusion of the object and the scene information in the given image and the CNN framework is named as FOS (fusion of object and scene) Net. In addition, a new loss named scene coherence loss (SCL) is developed to train the FOSNet and to improve the scene recognition performance. The proposed SCL is based on the unique traits of the scene that the 'sceneness' spreads and the scene class does not change all over the image. The proposed FOSNet was experimented with three most popular scene recognition datasets, and their state-of-the-art performance is obtained in two sets: 60.14% on Places 2 and 90.37% on MIT indoor 67. The second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
