Vision-Language Model Based Multi-Expert Fusion for CT Image Classification
Jianfa Bai, Kejin Lu, Runtian Yuan, Qingqiu Li, Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng

TL;DR
This paper introduces a multi-expert, source-aware framework combining lung-aware 3D, MedSigLIP, and Transformer experts for robust COVID-19 CT classification across multiple sources, achieving high accuracy and AUC.
Contribution
It presents a novel three-stage multi-expert fusion approach that explicitly models source information to improve multi-source COVID-19 CT classification robustness.
Findings
Stage 1 model achieved macro-F1 of 0.9711
Stage 2 experts achieved AUC scores over 0.985
Source classifier reached over 91% accuracy
Abstract
Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volumes for volumetric classification. Second, we develop two MedSigLIP-based experts: a slice-wise representation and probability learning module, and a Transformer-based inter-slice context modeling module for capturing cross-slice dependency. Third, we train a source classifier to predict the latent source identity of each test scan. By leveraging the predicted source information, we perform model fusion and voting based on different experts. On the validation set covering all four sources, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
