Integrating Human Gaze into Attention for Egocentric Activity Recognition
Kyle Min, Jason J. Corso

TL;DR
This paper presents a probabilistic method to incorporate human gaze data into deep neural network attention mechanisms for egocentric activity recognition, effectively handling gaze uncertainty and absence during testing.
Contribution
It introduces a variational approach that models gaze fixation points as latent variables, enabling gaze integration without needing gaze data at test time.
Findings
Outperforms previous state-of-the-art on EGTEA dataset
Effectively models gaze uncertainty with a probabilistic approach
Improves activity recognition accuracy using gaze-based attention
Abstract
It is well known that human gaze carries significant information about visual attention. However, there are three main difficulties in incorporating the gaze data in an attention mechanism of deep neural networks: 1) the gaze fixation points are likely to have measurement errors due to blinking and rapid eye movements; 2) it is unclear when and how much the gaze data is correlated with visual attention; and 3) gaze data is not always available in many real-world situations. In this work, we introduce an effective probabilistic approach to integrate human gaze into spatiotemporal attention for egocentric activity recognition. Specifically, we represent the locations of gaze fixation points as structured discrete latent variables to model their uncertainties. In addition, we model the distribution of gaze fixations using a variational method. The gaze distribution is learned during the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
