What Do Deep Saliency Models Learn about Visual Attention?
Shi Chen, Ming Jiang, Qi Zhao

TL;DR
This paper introduces a new analytical framework that interprets and quantifies what deep saliency models learn about visual attention, revealing their implicit features and how they relate to semantic attributes.
Contribution
The paper presents a novel framework that decomposes deep saliency models into interpretable semantic bases, enabling detailed analysis of their learned features and behaviors.
Findings
Semantic attributes influence saliency predictions positively and negatively.
Training data and architecture significantly affect model behavior.
The framework reveals common failure patterns and attention characteristics in various scenarios.
Abstract
In recent years, deep saliency models have made significant progress in predicting human visual attention. However, the mechanisms behind their success remain largely unexplained due to the opaque nature of deep neural networks. In this paper, we present a novel analytic framework that sheds light on the implicit features learned by saliency models and provides principled interpretation and quantification of their contributions to saliency prediction. Our approach decomposes these implicit features into interpretable bases that are explicitly aligned with semantic attributes and reformulates saliency prediction as a weighted combination of probability maps connecting the bases and saliency. By applying our framework, we conduct extensive analyses from various perspectives, including the positive and negative weights of semantics, the impact of training data and architectural designs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVisual Attention and Saliency Detection · Face Recognition and Perception · Visual perception and processing mechanisms
