Learning Scene Gist with Convolutional Neural Networks to Improve Object   Recognition

Kevin Wu; Eric Wu; Gabriel Kreiman

arXiv:1803.01967·cs.CV·June 12, 2018·5 cites

Learning Scene Gist with Convolutional Neural Networks to Improve Object Recognition

Kevin Wu, Eric Wu, Gabriel Kreiman

PDF

Open Access

TL;DR

This paper introduces GistNet, a biologically inspired CNN that leverages scene 'gist' to enhance object recognition accuracy by up to 50%, mimicking human visual processing.

Contribution

The paper presents a novel two-part CNN model that incorporates scene context, demonstrating significant accuracy improvements with minimal increase in model size.

Findings

01

GistNet improves object recognition accuracy by up to 50%.

02

Incorporating scene context enhances recognition performance.

03

Model mimics human visual processing by using scene 'gist'.

Abstract

Advancements in convolutional neural networks (CNNs) have made significant strides toward achieving high performance levels on multiple object recognition tasks. While some approaches utilize information from the entire scene to propose regions of interest, the task of interpreting a particular region or object is still performed independently of other objects and features in the image. Here we demonstrate that a scene's 'gist' can significantly contribute to how well humans can recognize objects. These findings are consistent with the notion that humans foveate on an object and incorporate information from the periphery to aid in recognition. We use a biologically inspired two-part convolutional neural network ('GistNet') that models the fovea and periphery to provide a proof-of-principle demonstration that computational object recognition can significantly benefit from the gist of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning