UIGaze: How Closely Can VLMs Approximate Human Visual Attention on User Interfaces?

Min Song; Yoonseong Lee; Yeonhu Seo

arXiv:2604.26352·cs.HC·April 30, 2026

UIGaze: How Closely Can VLMs Approximate Human Visual Attention on User Interfaces?

Min Song, Yoonseong Lee, Yeonhu Seo

PDF

1 Repo

TL;DR

This paper evaluates how well vision language models can predict human visual attention on user interfaces using eye-tracking data, revealing moderate alignment that varies with UI type and viewing duration.

Contribution

It introduces UIGaze, a comprehensive study assessing VLMs' ability to approximate human gaze patterns on diverse UIs with real eye-tracking data.

Findings

01

VLMs achieve moderate correlation with human gaze patterns.

02

Alignment improves with longer viewing durations.

03

Performance varies significantly across different UI types.

Abstract

Vision Language Models (VLMs) have demonstrated strong capabilities in understanding visual content, yet their ability to predict where humans look on user interfaces remains unexplored. We present UIGaze, a study investigating how closely VLMs can approximate human visual attention on user interfaces using real eye-tracking data. Using the UEyes dataset - comprising 1,980 UI screenshots across four categories (webpage, desktop, mobile, poster) with eye-tracking data from 62 participants - we evaluate nine state-of-the-art VLMs through a zero-shot coordinate prediction pipeline. Each model generates gaze point coordinates that are converted into saliency maps via Gaussian blurring and compared against ground truth using CC, SIM, and KL divergence. Our experiments (1,980 images x 9 models x 3 runs x 3 durations) reveal that VLMs achieve moderate alignment with human gaze patterns, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.