SkillSight: Efficient First-Person Skill Assessment with Gaze

Chi Hsuan Wu; Kumar Ashutosh; Kristen Grauman

arXiv:2511.19629·cs.CV·April 7, 2026

SkillSight: Efficient First-Person Skill Assessment with Gaze

Chi Hsuan Wu, Kumar Ashutosh, Kristen Grauman

PDF

TL;DR

SkillSight introduces a power-efficient method for assessing skills using gaze data from first-person videos, achieving high accuracy with significantly reduced energy consumption.

Contribution

The paper presents a novel two-stage framework that models gaze and video for skill assessment and distills it into a gaze-only model, enabling efficient real-world applications.

Findings

01

Gaze data significantly improves skill assessment accuracy.

02

The gaze-only model reduces power consumption by 73x compared to video-based methods.

03

SkillSight achieves state-of-the-art performance across diverse datasets.

Abstract

Egocentric perception on smart glasses could transform how we learn new skills in the physical world, but automatic skill assessment remains a fundamental technical challenge. We introduce SkillSight for power-efficient skill assessment from first-person data. Central to our approach is the hypothesis that skill level is evident not only in how a person performs an activity (video), but also in how they direct their attention when doing so (gaze). Our two-stage framework first learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model. At inference, the student model requires only gaze input, drastically reducing power consumption by eliminating continuous video processing. Experiments on three datasets spanning cooking, music, and sports establish, for the first time, the valuable role of gaze in skill understanding across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.