VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback

Sayeh Gholipour Picha; Dawood Al Chanti; Alice Caplier

arXiv:2501.17726·cs.CV·June 26, 2025

VICCA: Visual Interpretation and Comprehension of Chest X-ray Anomalies in Generated Report Without Human Feedback

Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier

PDF

Open Access 1 Repo

TL;DR

VICCA introduces a multimodal AI framework that improves the interpretability and validation of chest X-ray reports by combining phrase grounding, image synthesis, and dual-scoring for localization and semantic accuracy, enhancing trustworthiness in medical AI.

Contribution

The paper presents a novel multimodal framework integrating phrase grounding and diffusion models with dual-scoring for validation, advancing report accuracy and interpretability without human feedback.

Findings

01

Achieves state-of-the-art pathology localization accuracy

02

Improves semantic consistency between reports and images

03

Provides a robust validation mechanism for AI-generated reports

Abstract

As artificial intelligence (AI) becomes increasingly central to healthcare, the demand for explainable and trustworthy models is paramount. Current report generation systems for chest X-rays (CXR) often lack mechanisms for validating outputs without expert oversight, raising concerns about reliability and interpretability. To address these challenges, we propose a novel multimodal framework designed to enhance the semantic alignment and localization accuracy of AI-generated medical reports. Our framework integrates two key modules: a Phrase Grounding Model, which identifies and localizes pathologies in CXR images based on textual prompts, and a Text-to-Image Diffusion Module, which generates synthetic CXR images from prompts while preserving anatomical fidelity. By comparing features between the original and generated images, we introduce a dual-scoring system: one score quantifies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sayeh1994/vicca
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis

MethodsDiffusion