MedMimic: Physician-Inspired Multimodal Fusion for Early Diagnosis of Fever of Unknown Origin
Minrui Chen, Yi Zhou, Huidong Jiang, Yuhan Zhu, Guanjie Zou, Minqi, Chen, Rong Tian, Hiroto Saigo

TL;DR
MedMimic is a multimodal framework that combines pretrained models and clinical data to improve early diagnosis of Fever of Unknown Origin, demonstrating high accuracy on real patient data.
Contribution
It introduces a novel multimodal fusion network inspired by real-world diagnosis, integrating imaging and clinical data for FUO classification.
Findings
Achieved macro-AUROC scores up to 0.9291 across seven tasks.
Outperformed conventional machine learning and single-modality deep learning methods.
Validated effectiveness through ablation studies and cross-validation.
Abstract
Fever of unknown origin FUO remains a diagnostic challenge. MedMimic is introduced as a multimodal framework inspired by real-world diagnostic processes. It uses pretrained models such as DINOv2, Vision Transformer, and ResNet-18 to convert high-dimensional 18F-FDG PET/CT imaging into low-dimensional, semantically meaningful features. A learnable self-attention-based fusion network then integrates these imaging features with clinical data for classification. Using 416 FUO patient cases from Sichuan University West China Hospital from 2017 to 2023, the multimodal fusion classification network MFCN achieved macro-AUROC scores ranging from 0.8654 to 0.9291 across seven tasks, outperforming conventional machine learning and single-modality deep learning methods. Ablation studies and five-fold cross-validation further validated its effectiveness. By combining the strengths of pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHematological disorders and diagnostics
MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
