Evaluating Locally Run Large Language Models (Gemma 2, Mistral Nemo, and Llama 3) for Outpatient Otorhinolaryngology Care: Retrospective Study

Christoph Raphael Buhr; Christopher Seifen; Katharina Bahr-Hamm; Tilman Huppertz; Johannes Pordzik; Harry Smith; Tom Kelsey; Andrew Blaikie; Christoph Matthias; Sebastian Kuhn; Jonas Eckrich

PMC · DOI:10.2196/76896·November 25, 2025

Evaluating Locally Run Large Language Models (Gemma 2, Mistral Nemo, and Llama 3) for Outpatient Otorhinolaryngology Care: Retrospective Study

Christoph Raphael Buhr, Christopher Seifen, Katharina Bahr-Hamm, Tilman Huppertz, Johannes Pordzik, Harry Smith, Tom Kelsey, Andrew Blaikie, Christoph Matthias, Sebastian Kuhn, Jonas Eckrich

PDF

Open Access

TL;DR

This study compares locally run large language models with human doctors in providing outpatient otorhinolaryngology care, finding that while models underperform, they show potential for future use.

Contribution

The study evaluates locally run LLMs (Gemma 2, Mistral Nemo, Llama 3) for real-world outpatient ORL care, addressing data protection concerns.

Findings

01

ORL doctors outperformed LLMs in medical adequacy and safety ratings.

02

Locally run LLMs showed potential but had higher risk ratings compared to human recommendations.

03

LLM-generated information had minimal influence on clinicians' diagnoses.

Abstract

Large language models (LLMs) have great potential to improve and make the work of clinicians more efficient. Previous studies have mainly focused on web-based services, such as ChatGPT, often with simulated cases. For the processing of personalized patient data, web-based services have major data protection concerns. Ensuring compliance with data protection and medical device regulations therefore remains a critical challenge for adopting LLMs in clinical settings. This retrospective single-center study aimed to evaluate locally run LLMs (Gemma 2, Mistral Nemo, and Llama 3) in providing diagnosis and treatment recommendation for real-world outpatient cases in otorhinolaryngology (ORL). Outpatient cases (n=30) from regular consultation hours and the emergency service at a university hospital ORL outpatient department were randomly selected. Documentation by ORL doctors, including…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures4

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Tracheal and airway disorders · Sinusitis and nasal conditions