DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
Akis Linardos, Matthias K\"ummerer, Ori Press, Matthias Bethge

TL;DR
DeepGaze IIE advances saliency prediction by combining multiple backbones for better calibration and out-of-domain performance, achieving state-of-the-art results on key benchmarks.
Contribution
This work introduces DeepGaze IIE, a novel model that combines multiple ImageNet backbones for improved calibration and generalization in saliency prediction.
Findings
Replacing VGG19 with ResNet50 improves performance from 78% to 85%.
Combining multiple backbones achieves 93% on MIT1003, a new state-of-the-art.
Models are overconfident in fixation predictions across datasets.
Abstract
Since 2014 transfer learning has become the key driver for the improvement of spatial saliency prediction; however, with stagnant progress in the last 3-5 years. We conduct a large-scale transfer learning study which tests different ImageNet backbones, always using the same read out architecture and learning protocol adopted from DeepGaze II. By replacing the VGG19 backbone of DeepGaze II with ResNet50 features we improve the performance on saliency prediction from 78% to 85%. However, as we continue to test better ImageNet models as backbones (such as EfficientNetB5) we observe no additional improvement on saliency prediction. By analyzing the backbones further, we find that generalization to other datasets differs substantially, with models being consistently overconfident in their fixation predictions. We show that by combining multiple backbones in a principled manner a good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
