How close are we to understanding image-based saliency?
Matthias K\"ummerer, Thomas Wallis, Matthias Bethge

TL;DR
This paper evaluates the effectiveness of current image-based saliency models by framing them as point processes and finds that they capture only a third of the explainable spatial information, highlighting ongoing challenges.
Contribution
It introduces a probabilistic point process framework for saliency evaluation and provides a method to identify where models fail to capture fixation information.
Findings
State-of-the-art models capture only one-third of explainable information.
A probabilistic framework allows for rigorous saliency evaluation.
Identifies specific failure modes of current saliency models.
Abstract
Within the set of the many complex factors driving gaze placement, the properities of an image that are associated with fixations under free viewing conditions have been studied extensively. There is a general impression that the field is close to understanding this particular association. Here we frame saliency models probabilistically as point processes, allowing the calculation of log-likelihoods and bringing saliency evaluation into the domain of information. We compared the information gain of state-of-the-art models to a gold standard and find that only one third of the explainable spatial information is captured. We additionally provide a principled method to show where and how models fail to capture information in the fixations. Thus, contrary to previous assertions, purely spatial saliency remains a significant challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
