GazeMoE: Perception of Gaze Target with Mixture-of-Experts

Zhuangzhuang Dai; Zhongxi Lu; Vincent G. Zakka; Luis J. Manso; Jose M Alcaraz Calero; Chen Li

arXiv:2603.06256·cs.CV·March 9, 2026

GazeMoE: Perception of Gaze Target with Mixture-of-Experts

Zhuangzhuang Dai, Zhongxi Lu, Vincent G. Zakka, Luis J. Manso, Jose M Alcaraz Calero, Chen Li

PDF

Open Access 1 Models

TL;DR

GazeMoE introduces a novel mixture-of-experts framework that enhances gaze target estimation by adaptively integrating multi-modal cues from foundation models, achieving state-of-the-art results in challenging scenarios.

Contribution

The paper presents GazeMoE, a new end-to-end model that leverages MoE modules and multi-modal cues for improved gaze estimation, addressing class imbalance and robustness issues.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Effectively handles class imbalance with auxiliary loss.

03

Demonstrates robustness through strategic data augmentations.

Abstract

Estimating human gaze target from visible images is a critical task for robots to understand human attention, yet the development of generalizable neural architectures and training paradigms remains challenging. While recent advances in pre-trained vision foundation models offer promising avenues for locating gaze targets, the integration of multi-modal cues -- including eyes, head poses, gestures, and contextual features -- demands adaptive and efficient decoding mechanisms. Inspired by Mixture-of-Experts (MoE) for adaptive domain expertise in large vision-language models, we propose GazeMoE, a novel end-to-end framework that selectively leverages gaze-target-related cues from a frozen foundation model through MoE modules. To address class imbalance in gaze target classification (in-frame vs. out-of-frame) and enhance robustness, GazeMoE incorporates a class-balancing auxiliary loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zdai257/GazeMoE
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Visual Attention and Saliency Detection · Multimodal Machine Learning Applications