Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Zhuyu Teng; Pei Chen; Yichen Cai; Ruoqing Lu; Zhaoqu Jiang; Jiayang Li; Weitao You; Lingyun Sun

arXiv:2603.12701·cs.HC·March 16, 2026

Seeing Eye to Eye: Enabling Cognitive Alignment Through Shared First-Person Perspective in Human-AI Collaboration

Zhuyu Teng, Pei Chen, Yichen Cai, Ruoqing Lu, Zhaoqu Jiang, Jiayang Li, Weitao You, Lingyun Sun

PDF

Open Access

TL;DR

This paper introduces Eye2Eye, a first-person perspective framework for human-AI collaboration that improves communication, understanding, and trust by aligning attention and shared context in AR environments.

Contribution

It presents a novel framework integrating joint attention, revisable memory, and reflective feedback to enhance cognitive alignment in human-AI collaboration.

Findings

01

Reduces task completion time

02

Decreases interaction load

03

Increases user trust

Abstract

Despite advances in multimodal AI, current vision-based assistants often remain inefficient in collaborative tasks. We identify two key gulfs: a communication gulf, where users must translate rich parallel intentions into verbal commands due to the channel mismatch , and an understanding gulf, where AI struggles to interpret subtle embodied cues. To address these, we propose Eye2Eye, a framework that leverages first-person perspective as a channel for human-AI cognitive alignment. It integrates three components: (1) joint attention coordination for fluid focus alignment, (2) revisable memory to maintain evolving common ground, and (3) reflective feedback allowing users to clarify and refine AI's understanding. We implement this framework in an AR prototype and evaluate it through a user study and a post-hoc pipeline evaluation. Results show that Eye2Eye significantly reduces task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety · Social Robot Interaction and HRI · Gaze Tracking and Assistive Technology