Gaze2Report: Radiology Report Generation via Visual-Gaze Prompt Tuning of LLMs
Aishik Konwer, Moinak Bhattacharya, Prateek Prasanna

TL;DR
Gaze2Report is a novel framework that integrates eye gaze data with large language models to improve radiology report generation, enabling gaze-guided learning and scanpath prediction without gaze input at inference.
Contribution
It introduces a multimodal prompt tuning approach using scanpath prediction and GNNs to incorporate eye gaze data into LLMs for better report relevance and interpretability.
Findings
Enhanced report quality with gaze-guided visual learning.
Model predicts scanpaths accurately during inference.
Operates effectively without gaze input during inference.
Abstract
Existing deep learning methods for radiology report generation enhance diagnostic efficiency but often overlook physician-informed medical priors. This leads to a suboptimal alignment between the structured explanations and disease manifestations. Eye gaze data provides critical insights into a radiologist's visual attention, enhancing the relevance and interpretability of extracted features while aligning with human decision-making processes. However, despite its promising potential, the integration of eye gaze information into AI-driven medical imaging workflows is impeded by challenges such as the complexity of multimodal data fusion and the high cost of gaze acquisition, particularly its absence during inference, limiting its practical applicability in real-world clinical settings. To address these issues, we introduce Gaze2Report, a framework which leverages a scanpath prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
