GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations
Qing Chang, Zhiming Hu

TL;DR
GazeInterpreter is a novel LLM-based system that interprets eye gaze data and integrates it with body motion to generate coherent narrations, advancing human behavior understanding.
Contribution
It introduces a hierarchical, iterative approach combining symbolic gaze parsing and LLMs to produce eye-body-coordinated narrations, a novel method in behavior interpretation.
Findings
Effective in generating integrated eye-body narrations
Improves performance on action anticipation tasks
Enhances behavior summarization accuracy
Abstract
Comprehensively interpreting human behavior is a core challenge in human-aware artificial intelligence. However, prior works typically focused on body behavior, neglecting the crucial role of eye gaze and its synergy with body motion. We present GazeInterpreter - a novel large language model-based (LLM-based) approach that parses eye gaze data to generate eye-body-coordinated narrations. Specifically, our method features 1) a symbolic gaze parser that translates raw gaze signals into symbolic gaze events; 2) a hierarchical structure that first uses an LLM to generate eye gaze narration at semantic level and then integrates gaze with body motion within the same observation window to produce integrated narration; and 3) a self-correcting loop that iteratively refines the modality match, temporal coherence, and completeness of the integrated narration. This hierarchical and iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Gaze Tracking and Assistive Technology · Visual Attention and Saliency Detection
