GazeVaLM: A Multi-Observer Eye-Tracking Benchmark for Evaluating Clinical Realism in AI-Generated X-Rays
David Wong, Zeynep Isik, Bin Wang, Marouane Tliba, Gorkem Durak, Elif Keles, Halil Ertugrul Aktas, Aladine Chetouani, Cagdas Topel, Nicolo Gennaro, Camila Lopes Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur

TL;DR
GazeVaLM is a comprehensive eye-tracking dataset and benchmark for evaluating clinical perception and AI model performance in authenticating chest X-rays, enabling human-AI comparison and research in medical image analysis.
Contribution
The paper introduces GazeVaLM, a novel dataset and protocol for studying gaze behavior, diagnostic accuracy, and AI-human comparison in chest radiograph authenticity assessment.
Findings
Radiologists show high gaze agreement and inter-observer consistency.
Benchmarking reveals differences between radiologists and LLMs in diagnostic accuracy.
The dataset enables analysis of gaze patterns, decision-making, and AI performance in medical imaging.
Abstract
We introduce GazeVaLM, a public eye-tracking dataset for studying clinical perception during chest radiograph authenticity assessment. The dataset comprises 960 gaze recordings from 16 expert radiologists interpreting 30 real and 30 synthetic chest X-rays (generated by diffusion based generative AI) under two conditions: diagnostic assessment and real-fake classification (Visual Turing test). For each image-observer pair, we provide raw gaze samples, fixation maps, scanpaths, saliency density maps, structured diagnostic labels, and authenticity judgments. We extend the protocol to 6 state-of-the-art multimodal LLMs, releasing their predicted diagnoses, authenticity labels, and confidence scores under matched conditions - enabling direct human-AI comparison at both decision and uncertainty levels. We further provide analyses of gaze agreement, inter-observer consistency, and benchmarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
