Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Alif Munim; Jun Ma; Omar Ibrahim; Alhusain Abdalla; Shuolin Yin; Leo Chen; and Bo Wang

arXiv:2601.03266·cs.CL·April 29, 2026

Benchmarking and Adapting On-Device LLMs for Clinical Decision Support

Alif Munim, Jun Ma, Omar Ibrahim, Alhusain Abdalla, Shuolin Yin, Leo Chen, and Bo Wang

PDF

TL;DR

This study benchmarks on-device open-source large language models for clinical decision support, demonstrating their competitive performance and adaptability, which supports privacy-preserving healthcare applications.

Contribution

It provides a comprehensive comparison of open-source on-device LLMs with proprietary models and shows how fine-tuning enhances their clinical diagnostic accuracy.

Findings

01

On-device models perform comparably to or better than some proprietary models.

02

Fine-tuning significantly improves diagnostic accuracy, approaching proprietary model performance.

03

Most diagnostic errors are clinically plausible, not off-topic.

Abstract

Large language models (LLMs) have rapidly advanced in clinical decision-making, yet the deployment of proprietary systems is hindered by privacy concerns and reliance on cloud-based infrastructure. Open-source alternatives allow local inference but often have large model sizes that limit their use in resource-constrained clinical settings. Here, we benchmark on-device LLMs from the gpt-oss (20b, 120b), Qwen3.5 (9B, 27B, 35B), and Gemma 4 (31B) families across three representative clinical tasks: general disease diagnosis, specialty-specific (ophthalmology) diagnosis and management, and simulation of human expert grading and evaluation. We compare their performance with state-of-the-art proprietary models (GPT-5.1, GPT-5-mini, and Gemini 3.1 Pro) and a leading open-source model (DeepSeek-R1), and we further evaluate the adaptability of on-device systems by fine-tuning gpt-oss-20b and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.