VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback
Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier

TL;DR
VICCA introduces a multimodal AI framework that improves the interpretability and validation of chest X-ray reports by combining phrase grounding, image synthesis, and dual-scoring for localization and semantic accuracy, enhancing trustworthiness in medical AI.
Contribution
The paper presents a novel multimodal framework integrating phrase grounding and diffusion models with dual-scoring for validation, advancing report accuracy and interpretability without human feedback.
Findings
Achieves state-of-the-art pathology localization accuracy
Improves semantic consistency between reports and images
Provides a robust validation mechanism for AI-generated reports
Abstract
As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis
MethodsDiffusion
