Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan; Ajay Bati; Sangmin Lee; Daniel Bolya; Judy Hoffman; James M. Rehg

arXiv:2412.09586·cs.CV·June 5, 2025·2 cites

Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders

Fiona Ryan, Ajay Bati, Sangmin Lee, Daniel Bolya, Judy Hoffman, James M. Rehg

PDF

Open Access 1 Repo

TL;DR

Gaze-LLE introduces a transformer-based framework utilizing a frozen DINOv2 encoder for efficient and accurate gaze target estimation, outperforming previous complex methods across multiple benchmarks.

Contribution

The paper presents Gaze-LLE, a novel transformer approach that simplifies gaze estimation by using a large-scale pre-trained encoder and a lightweight decoding module.

Findings

01

Achieves state-of-the-art results on several gaze benchmarks.

02

Demonstrates the effectiveness of using frozen large-scale feature extractors.

03

Provides extensive analysis validating design choices.

Abstract

We address the problem of gaze target estimation, which aims to predict where a person is looking in a scene. Predicting a person's gaze target requires reasoning both about the person's appearance and the contents of the scene. Prior works have developed increasingly complex, hand-crafted pipelines for gaze target estimation that carefully fuse features from separate scene encoders, head encoders, and auxiliary models for signals like depth and pose. Motivated by the success of general-purpose feature extractors on a variety of visual tasks, we propose Gaze-LLE, a novel transformer framework that streamlines gaze target estimation by leveraging features from a frozen DINOv2 encoder. We extract a single feature representation for the scene, and apply a person-specific positional prompt to decode gaze with a lightweight module. We demonstrate state-of-the-art performance across several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fkryan/gazelle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Hand Gesture Recognition Systems · Gait Recognition and Analysis