Visual Attention is Beyond One Single Saliency Map
Jian Li

TL;DR
This paper argues that visual attention should be modeled as a dynamic process rather than a single static saliency map, using a frequency domain model to better predict human fixation behavior over time.
Contribution
It introduces a global inhibition model in the frequency domain to simulate the dynamic nature of visual attention and fixation distribution.
Findings
Model predicts human dynamic fixation distribution effectively.
Dynamic attention process varies over time and is influenced by a key parameter.
Single saliency maps are insufficient to describe human visual attention.
Abstract
Of later years, numerous bottom-up attention models have been proposed on different assumptions. However, the produced saliency maps may be different from each other even from the same input image. We also observe that human fixation map varies across time greatly. When people freely view an image, they tend to allocate attention at salient regions of large scale at first, and then search more and more detailed regions. In this paper, we argue that, for one input image visual attention cannot be described by only one single saliency map, and this mechanism should be modeled as a dynamic process. Under the frequency domain paradigm, we proposed a global inhibition model to mimic this process by suppressing the {\it non-saliency} in the input image; we also show that the dynamic process is influenced by one parameter in the frequency domain. Experiments illustrate that the proposed model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Olfactory and Sensory Function Studies · Visual perception and processing mechanisms
