GazeXplain: Learning to Predict Natural Language Explanations of Visual   Scanpaths

Xianyu Chen; Ming Jiang; Qi Zhao

arXiv:2408.02788·cs.CV·August 7, 2024

GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths

Xianyu Chen, Ming Jiang, Qi Zhao

PDF

Open Access

TL;DR

GazeXplain introduces a model that jointly predicts human visual scanpaths and generates natural language explanations, enhancing understanding of visual attention and cognitive processes across diverse datasets.

Contribution

It presents a novel attention-language decoder with semantic alignment and co-training for explainable scanpath prediction, bridging the gap between gaze prediction and explanation.

Findings

01

Effective in predicting scanpaths across datasets

02

Generates coherent natural language explanations

03

Improves understanding of visual attention mechanisms

Abstract

While exploring visual scenes, humans' scanpaths are driven by their underlying attention processes. Understanding visual scanpaths is essential for various applications. Traditional scanpath models predict the where and when of gaze shifts without providing explanations, creating a gap in understanding the rationale behind fixations. To bridge this gap, we introduce GazeXplain, a novel study of visual scanpath prediction and explanation. This involves annotating natural-language explanations for fixations across eye-tracking datasets and proposing a general model with an attention-language decoder that jointly predicts scanpaths and generates explanations. It integrates a unique semantic alignment mechanism to enhance the consistency between fixations and explanations, alongside a cross-dataset co-training approach for generalization. These novelties present a comprehensive and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Anomaly Detection Techniques and Applications · Topic Modeling

MethodsSoftmax · Attention Is All You Need