Improving saliency models' predictions of the next fixation with humans' intrinsic cost of gaze shifts
Florian Kadner, Tobias Thomas, David Hoppe, Constantin A. Rothkopf

TL;DR
This paper introduces a new framework that enhances saliency models by incorporating human gaze costs and sequential decision-making, significantly improving predictions of where humans will look next.
Contribution
It presents a novel algorithm that converts static saliency maps into dynamic, history-dependent predictions using human gaze costs and exploration bonuses, outperforming existing models.
Findings
Significant improvement in NSS and AUC scores across datasets
Effective integration of human gaze cost functions into saliency prediction
Outperforms five state-of-the-art saliency models on multiple datasets
Abstract
The human prioritization of image regions can be modeled in a time invariant fashion with saliency maps or sequentially with scanpath models. However, while both types of models have steadily improved on several benchmarks and datasets, there is still a considerable gap in predicting human gaze. Here, we leverage two recent developments to reduce this gap: theoretical analyses establishing a principled framework for predicting the next gaze target and the empirical measurement of the human cost for gaze switches independently of image content. We introduce an algorithm in the framework of sequential decision making, which converts any static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift. These maps are based on 1) a saliency map provided by an arbitrary saliency model, 2) the recently measured human cost function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology
