Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification
Sirui Li, Li Lin, Yijin Huang, Pujin Cheng, Xiaoying Tang

TL;DR
This paper introduces TFA-LT, a lightweight two-stage adaptation method that significantly improves long-tailed medical image classification by leveraging foundation models with minimal GPU resources.
Contribution
The paper proposes a novel, simple, and efficient two-stage adaptation strategy for foundation models tailored to long-tailed medical image classification tasks.
Findings
Achieves up to 27.1% accuracy improvement
Uses only 6.1% GPU memory of current best methods
Effective in handling data imbalance in medical datasets
Abstract
In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models. Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning. However, their limited medical-specific pretraining hinders their performance in medical image classification relative to natural images. To address this issue, we propose a novel Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT). We adopt a two-stage training strategy, integrating representations from the foundation model using just two linear adapters and a single ensembler for balanced outcomes. Experimental results on two long-tailed medical image datasets validate the simplicity, lightweight and efficiency of our approach:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · COVID-19 diagnosis using AI · Image Retrieval and Classification Techniques
