DeepGaze II: Reading fixations from deep features trained on object   recognition

Matthias K\"ummerer; Thomas S. A. Wallis; Matthias Bethge

arXiv:1610.01563·cs.CV·October 6, 2016·262 cites

DeepGaze II: Reading fixations from deep features trained on object recognition

Matthias K\"ummerer, Thomas S. A. Wallis, Matthias Bethge

PDF

Open Access

TL;DR

DeepGaze II leverages pre-trained VGG-19 deep features to predict human eye fixations on images, achieving high accuracy without additional fine-tuning and demonstrating the versatility of object recognition features for saliency prediction.

Contribution

The paper introduces DeepGaze II, a saliency model that uses pre-trained deep features for fixation prediction, highlighting transfer learning's effectiveness without extensive retraining.

Findings

01

DeepGaze II explains 87% of explainable fixation patterns.

02

The model achieves top performance on the MIT300 benchmark.

03

Deep features trained on object recognition are highly effective for saliency prediction.

Abstract

Here we present DeepGaze II, a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network trained to identify objects in images. Contrary to other saliency models that use deep features, here we use the VGG features for saliency prediction with no additional fine-tuning (rather, a few readout layers are trained on top of the VGG features to predict saliency). The model is therefore a strong test of transfer learning. After conservative cross-validation, DeepGaze II explains about 87% of the explainable information gain in the patterns of fixations and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark. These results corroborate the finding from DeepGaze I (which explained 56% of the explainable information gain), that deep features trained on object recognition provide a versatile feature space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsVisual Geometry Group 19 Layer CNN · Dropout · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Convolution · Ethereum Customer Service Number +1-833-534-1729