DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities

Nagur Shareef Shaik; Teja Krishna Cherukuri; Adnan Masood; Dong Hye Ye

arXiv:2511.05968·cs.CV·November 11, 2025

DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities

Nagur Shareef Shaik, Teja Krishna Cherukuri, Adnan Masood, Dong Hye Ye

PDF

Open Access 1 Video

TL;DR

This paper introduces DiA-gnostic VLVAE, a novel model that improves radiology report generation by disentangling shared and modality-specific features, making it robust to missing data and reducing hallucinations.

Contribution

The paper presents a disentangled alignment-constrained VLVAE that enhances robustness to missing modalities and improves report accuracy in radiology imaging.

Findings

01

Achieved competitive BLEU@4 scores on IU X-Ray and MIMIC-CXR datasets.

02

Significantly outperformed state-of-the-art models in experiments.

03

Effectively disentangled shared and modality-specific features.

Abstract

The integration of medical images with clinical context is essential for generating accurate and clinically interpretable radiology reports. However, current automated methods often rely on resource-heavy Large Language Models (LLMs) or static knowledge graphs and struggle with two fundamental challenges in real-world clinical data: (1) missing modalities, such as incomplete clinical context , and (2) feature entanglement, where mixed modality-specific and shared information leads to suboptimal fusion and clinically unfaithful hallucinated findings. To address these challenges, we propose the DiA-gnostic VLVAE, which achieves robust radiology reporting through Disentangled Alignment. Our framework is designed to be resilient to missing modalities by disentangling shared and modality-specific features using a Mixture-of-Experts (MoE) based Vision-Language Variational Autoencoder (VLVAE).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning