Locally-Supervised Deep Hybrid Model for Scene Recognition
Sheng Guo, Weilin Huang, Limin Wang, Yu Qiao

TL;DR
This paper introduces a novel hybrid model that enhances local convolutional features with supervised learning and encodes them into a fixed-length vector, significantly improving scene recognition accuracy.
Contribution
It proposes a new Local Convolutional Supervision layer and Fisher Convolutional Vector for better local feature utilization in scene recognition.
Findings
Achieves 83.75% accuracy on MIT Indoor67 dataset.
Achieves 67.56% accuracy on SUN397 dataset.
Outperforms previous state-of-the-art methods.
Abstract
Convolutional neural networks (CNN) have recently achieved remarkable successes in various image classification and understanding tasks. The deep features obtained at the top fully-connected layer of the CNN (FC-features) exhibit rich global semantic information and are extremely effective in image classification. On the other hand, the convolutional features in the middle layers of the CNN also contain meaningful local information, but are not fully explored for image representation. In this paper, we propose a novel Locally-Supervised Deep Hybrid Model (LS-DHM) that effectively enhances and explores the convolutional features for scene recognition. Firstly, we notice that the convolutional features capture local objects and fine structures of scene images, which yield important cues for discriminating ambiguous scenes, whereas these features are significantly eliminated in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
