EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on   Physics-Informed Dynamics

Xiaochuan Liu; Xin Cheng; Yuchong Sun; Xiaoxue Wu; Ruihua Song; Hao; Sun; Denghao Zhang

arXiv:2502.20858·cs.MM·March 3, 2025

EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics

Xiaochuan Liu, Xin Cheng, Yuchong Sun, Xiaoxue Wu, Ruihua Song, Hao, Sun, Denghao Zhang

PDF

1 Repo 1 Video

TL;DR

This paper introduces EyEar, a physics-informed learning framework that predicts human gaze trajectories in visual scenes synchronized with audio, filling a gap in multimodal gaze prediction research.

Contribution

The paper proposes a novel physics-informed dynamic model for audio-visual gaze prediction and introduces a new dataset with synchronized gaze and audio data.

Findings

01

EyEar outperforms baseline models across all evaluation metrics.

02

The probability density score improves gaze trajectory stabilization.

03

Incorporating audio significantly enhances gaze prediction accuracy.

Abstract

Imitating how humans move their gaze in a visual scene is a vital research problem for both visual understanding and psychology, kindling crucial applications such as building alive virtual characters. Previous studies aim to predict gaze trajectories when humans are free-viewing an image, searching for required targets, or looking for clues to answer questions in an image. While these tasks focus on visual-centric scenarios, humans move their gaze also along with audio signal inputs in more common scenarios. To fill this gap, we introduce a new task that predicts human gaze trajectories in a visual scene with synchronized audio inputs and provide a new dataset containing 20k gaze points from 8 subjects. To effectively integrate audio information and simulate the dynamic process of human gaze motion, we propose a novel learning framework called EyEar (Eye moving while Ear listening)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XiaochuanLiu-ruc/EyEar
noneOfficial

Videos

EyEar: Learning Audio Synchronized Human Gaze Trajectory Based on Physics-Informed Dynamics· underline