Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition
Kevin Wu, Eric Wu, Gabriel Kreiman

TL;DR
This paper introduces GistNet, a biologically inspired CNN that leverages scene 'gist' to enhance object recognition accuracy by up to 50%, mimicking human visual processing.
Contribution
The paper presents a novel two-part CNN model that incorporates scene context, demonstrating significant accuracy improvements with minimal increase in model size.
Findings
GistNet improves object recognition accuracy by up to 50%.
Incorporating scene context enhances recognition performance.
Model mimics human visual processing by using scene 'gist'.
Abstract
Advancements in convolutional neural networks (CNNs) have made significant strides toward achieving high performance levels on multiple object recognition tasks. While some approaches utilize information from the entire scene to propose regions of interest, the task of interpreting a particular region or object is still performed independently of other objects and features in the image. Here we demonstrate that a scene's 'gist' can significantly contribute to how well humans can recognize objects. These findings are consistent with the notion that humans foveate on an object and incorporate information from the periphery to aid in recognition. We use a biologically inspired two-part convolutional neural network ('GistNet') that models the fovea and periphery to provide a proof-of-principle demonstration that computational object recognition can significantly benefit from the gist of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
