Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling
Hosnieh Sattar, Andreas Bulling, Mario Fritz

TL;DR
This paper introduces a novel gaze pooling layer for CNNs that predicts categories and attributes of visual search targets from eye gaze data, improving accuracy without extensive retraining.
Contribution
The authors propose a Gaze Pooling Layer that integrates gaze data into CNNs as an attention mechanism, enabling effective search target prediction with minimal retraining.
Findings
Effective gaze-based search target prediction demonstrated
Gaze pooling layer improves recognition accuracy
Method works with pre-trained CNNs without retraining
Abstract
Predicting the target of visual search from eye fixation (gaze) data is a challenging problem with many applications in human-computer interaction. In contrast to previous work that has focused on individual instances as a search target, we propose the first approach to predict categories and attributes of search targets based on gaze data. However, state of the art models for categorical recognition, in general, require large amounts of training data, which is prohibitive for gaze data. To address this challenge, we propose a novel Gaze Pooling Layer that integrates gaze information into CNN-based architectures as an attention mechanism - incorporating both spatial and temporal aspects of human gaze behavior. We show that our approach is effective even when the gaze pooling layer is added to an already trained CNN, thus eliminating the need for expensive joint data collection of visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Retinal Imaging and Analysis
