TL;DR
The paper introduces a novel in-context learning framework for medical diagnosis using multimodal large language models, combining discriminative exemplar selection and self-refined experience summarization to improve performance without model fine-tuning.
Contribution
It proposes a Clinician Mimetic Workflow that synergizes DECS and SRES, achieving state-of-the-art results in parameter-efficient medical in-context learning.
Findings
Outperforms zero-shot general and medical MLLMs on MedMNIST 2D datasets.
Achieves performance comparable to fully supervised and fine-tuned models.
Sets new benchmarks for parameter-efficient medical in-context learning.
Abstract
General Multimodal Large Language Models (MLLMs) often underperform in capturing domain-specific nuances in medical diagnosis, trailing behind fully supervised baselines. Although fine-tuning provides a remedy, the high costs of expert annotation and massive computational overhead limit its scalability. To bridge this gap without updating the weights of the pre-trained backbone of the MLLM, we propose a Clinician Mimetic Workflow. This is a novel In-Context Learning (ICL) framework designed to synergize Discriminative Exemplar Coreset Selection (DECS) and Self-Refined Experience Summarization (SRES). Specifically, DECS simulates a clinician's ability to reference "anchor cases" by selecting discriminative visual coresets from noisy data at the computational level; meanwhile, SRES mimics the cognition and reflection in clinical diagnosis by distilling diverse rollouts into a dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
