A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning

Zelin Zang; Wenyi Gu; Siqi Ma; Dan Yang; Yue Shen; Zhu Zhang; Guohui Fan; Wing-Kuen Ling; Fuji Yang

arXiv:2512.21583·cs.AI·December 29, 2025

A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning

Zelin Zang, Wenyi Gu, Siqi Ma, Dan Yang, Yue Shen, Zhu Zhang, Guohui Fan, Wing-Kuen Ling, Fuji Yang

PDF

Open Access

TL;DR

This paper introduces a multimodal diagnostic framework that combines vision-language models with logic-based reasoning to improve accuracy and interpretability in medical AI diagnostics.

Contribution

It presents a novel framework integrating vision-language alignment with logic tree reasoning, enhancing trustworthiness and interpretability in multimodal medical diagnosis.

Findings

01

Improved diagnostic accuracy on MedXpertQA benchmark

02

More interpretable reasoning traces produced

03

Competitive performance on text-only tasks

Abstract

With the rapid growth of large language models (LLMs) and vision-language models (VLMs) in medicine, simply integrating clinical text and medical imaging does not guarantee reliable reasoning. Existing multimodal models often produce hallucinations or inconsistent chains of thought, limiting clinical trust. We propose a diagnostic framework built upon LLaVA that combines vision-language alignment with logic-regularized reasoning. The system includes an input encoder for text and images, a projection module for cross-modal alignment, a reasoning controller that decomposes diagnostic tasks into steps, and a logic tree generator that assembles stepwise premises into verifiable conclusions. Evaluations on MedXpertQA and other benchmarks show that our method improves diagnostic accuracy and yields more interpretable reasoning traces on multimodal tasks, while remaining competitive on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)