DeepGaze II: Reading fixations from deep features trained on object recognition
Matthias K\"ummerer, Thomas S. A. Wallis, Matthias Bethge

TL;DR
DeepGaze II leverages pre-trained VGG-19 deep features to predict human eye fixations on images, achieving high accuracy without additional fine-tuning and demonstrating the versatility of object recognition features for saliency prediction.
Contribution
The paper introduces DeepGaze II, a saliency model that uses pre-trained deep features for fixation prediction, highlighting transfer learning's effectiveness without extensive retraining.
Findings
DeepGaze II explains 87% of explainable fixation patterns.
The model achieves top performance on the MIT300 benchmark.
Deep features trained on object recognition are highly effective for saliency prediction.
Abstract
Here we present DeepGaze II, a model that predicts where people look in images. The model uses the features from the VGG-19 deep neural network trained to identify objects in images. Contrary to other saliency models that use deep features, here we use the VGG features for saliency prediction with no additional fine-tuning (rather, a few readout layers are trained on top of the VGG features to predict saliency). The model is therefore a strong test of transfer learning. After conservative cross-validation, DeepGaze II explains about 87% of the explainable information gain in the patterns of fixations and achieves top performance in area under the curve metrics on the MIT300 hold-out benchmark. These results corroborate the finding from DeepGaze I (which explained 56% of the explainable information gain), that deep features trained on object recognition provide a versatile feature space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Human Pose and Action Recognition
MethodsVisual Geometry Group 19 Layer CNN · Dropout · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Convolution · Ethereum Customer Service Number +1-833-534-1729
