Open- and closed-source LLMs in medical and engineering education

Liping Sun; Ya Li; Hongxing Kan; Jianhua Shu; Huanqing Xu; Chengle Li; Guokun Shi; Ziyang Wang; Xueqi Wang; Li Jin

PMC · DOI:10.3389/fmed.2025.1751813·January 13, 2026

Open- and closed-source LLMs in medical and engineering education

Liping Sun, Ya Li, Hongxing Kan, Jianhua Shu, Huanqing Xu, Chengle Li, Guokun Shi, Ziyang Wang, Xueqi Wang, Li Jin

PDF

Open Access

TL;DR

This paper compares open-source and closed-source large language models in medical and engineering education, finding that open-source models like DeepSeek perform well and can be improved with prompt engineering.

Contribution

The study introduces and evaluates prompt engineering strategies to enhance open-source LLMs for educational tasks in medical and engineering fields.

Findings

01

DeepSeek outperformed other models across all question types with the highest accuracy rates.

02

Prompt engineering significantly improved model accuracy, with DeepSeek exceeding 95% accuracy for all question types.

03

Short-answer questions achieved up to 97% accuracy across four LLMs, highlighting the effectiveness of prompt engineering in problem-solving tasks.

Abstract

The rapid development of large language models (LLMs), such as the close-source GPT-4, have revolutionized education in assisting students learning. However, open-source LLMs, which have many advantages of accessibility, customization, and transparency, remains under-utilized in both medical and engineering education. The work systematically evaluates the performance of open-source LLMs (DeepSeek, GLM-4, Kimi) and close-source GPT-4 in assisting medical and engineering students learning through diverse question types. We found that DeepSeek outperformed other models for all question types, achieving the highest accuracy rates. To further improve LLM-generated responses, prompt engineering strategies, such as role-playing, generated knowledge prompting, chain-of-thought prompting, few-shot prompting, and output style, were introduced. Post-training evaluations showed significant…

Figures6

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning