CNN-based search model underestimates attention guidance by simple visual features
Endel Poder

TL;DR
This paper evaluates a CNN-based attention guidance model in visual search tasks and finds it underestimates human attention guidance due to lack of bottom-up guidance and potentially inadequate feature learning.
Contribution
The study adapts a CNN-based attention model for search experiments and demonstrates its limitations in replicating human attention guidance.
Findings
CNN model underestimates human attention guidance
Lacks bottom-up guidance in the model
Standard CNNs may not learn features needed for human-like attention
Abstract
Recently, Zhang et al. (2018) proposed an interesting model of attention guidance that uses visual features learnt by convolutional neural networks for object recognition. I adapted this model for search experiments with accuracy as the measure of performance. Simulation of our previously published feature and conjunction search experiments revealed that CNN-based search model considerably underestimates human attention guidance by simple visual features. A simple explanation is that the model has no bottom-up guidance of attention. Another view might be that standard CNNs do not learn features required for human-like attention guidance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Visual perception and processing mechanisms · Infrared Target Detection Methodologies
