GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Qing Chang; Zhiming Hu

arXiv:2511.16245·cs.HC·November 21, 2025

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations

Qing Chang, Zhiming Hu

PDF

Open Access 1 Video

TL;DR

GazeInterpreter is a novel LLM-based system that interprets eye gaze data and integrates it with body motion to generate coherent narrations, advancing human behavior understanding.

Contribution

It introduces a hierarchical, iterative approach combining symbolic gaze parsing and LLMs to produce eye-body-coordinated narrations, a novel method in behavior interpretation.

Findings

01

Effective in generating integrated eye-body narrations

02

Improves performance on action anticipation tasks

03

Enhances behavior summarization accuracy

Abstract

Comprehensively interpreting human behavior is a core challenge in human-aware artificial intelligence. However, prior works typically focused on body behavior, neglecting the crucial role of eye gaze and its synergy with body motion. We present GazeInterpreter - a novel large language model-based (LLM-based) approach that parses eye gaze data to generate eye-body-coordinated narrations. Specifically, our method features 1) a symbolic gaze parser that translates raw gaze signals into symbolic gaze events; 2) a hierarchical structure that first uses an LLM to generate eye gaze narration at semantic level and then integrates gaze with body motion within the same observation window to produce integrated narration; and 3) a self-correcting loop that iteratively refines the modality match, temporal coherence, and completeness of the integrated narration. This hierarchical and iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GazeInterpreter: Parsing Eye Gaze to Generate Eye-Body-Coordinated Narrations· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Gaze Tracking and Assistive Technology · Visual Attention and Saliency Detection