LLM is Not All You Need: A Systematic Evaluation of ML vs. Foundation Models for text and image based Medical Classification
Meet Raval, Tejul Pandit, Dhvani Upadhyay

TL;DR
This study systematically compares traditional ML models with foundation models for medical classification across text and image data, revealing that classical ML often outperforms modern transformer-based models in many scenarios.
Contribution
It provides a comprehensive benchmark demonstrating that traditional ML models remain highly effective, and that PEFT strategies may not always improve foundation model performance in medical tasks.
Findings
Classical ML models outperform foundation models on most tasks.
LoRA-tuned models perform poorly with minimal fine-tuning.
Foundation models show competitive results only in certain image classification tasks.
Abstract
The combination of multimodal Vision-Language Models (VLMs) and Large Language Models (LLMs) opens up new possibilities for medical classification. This work offers a rigorous, unified benchmark by using four publicly available datasets covering text and image modalities (binary and multiclass complexity) that contrasts traditional Machine Learning (ML) with contemporary transformer-based techniques. We evaluated three model classes for each task: Classical ML (LR, LightGBM, ResNet-50), Prompt-Based LLMs/VLMs (Gemini 2.5), and Fine-Tuned PEFT Models (LoRA-adapted Gemma3 variants). All experiments used consistent data splits and aligned metrics. According to our results, traditional machine learning (ML) models set a high standard by consistently achieving the best overall performance across most medical categorization tasks. This was especially true for structured text-based datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
