From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

Valdemar Danry; Javier Hernandez; Andrew Wilson; Pattie Maes; Judith Amores

arXiv:2604.08062·cs.HC·April 10, 2026

From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

Valdemar Danry, Javier Hernandez, Andrew Wilson, Pattie Maes, Judith Amores

PDF

TL;DR

This paper introduces a gaze-grounded multimodal LLM assistant that uses egocentric video and gaze data to identify user difficulties, enhancing personalization, comprehension, and interaction efficiency.

Contribution

It presents a novel gaze-aware AI assistant that interprets cognitive needs using multimodal data, demonstrating improved user assessment and interaction outcomes.

Findings

01

Gaze-aware assistant was rated more accurate and personalized.

02

Users recalled more information with the gaze-aware assistant.

03

Interactions were more efficient, with users speaking fewer words.

Abstract

Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users' reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.