# Open- and closed-source LLMs in medical and engineering education

**Authors:** Liping Sun, Ya Li, Hongxing Kan, Jianhua Shu, Huanqing Xu, Chengle Li, Guokun Shi, Ziyang Wang, Xueqi Wang, Li Jin

PMC · DOI: 10.3389/fmed.2025.1751813 · 2026-01-13

## TL;DR

This paper compares open-source and closed-source large language models in medical and engineering education, finding that open-source models like DeepSeek perform well and can be improved with prompt engineering.

## Contribution

The study introduces and evaluates prompt engineering strategies to enhance open-source LLMs for educational tasks in medical and engineering fields.

## Key findings

- DeepSeek outperformed other models across all question types with the highest accuracy rates.
- Prompt engineering significantly improved model accuracy, with DeepSeek exceeding 95% accuracy for all question types.
- Short-answer questions achieved up to 97% accuracy across four LLMs, highlighting the effectiveness of prompt engineering in problem-solving tasks.

## Abstract

The rapid development of large language models (LLMs), such as the close-source GPT-4, have revolutionized education in assisting students learning. However, open-source LLMs, which have many advantages of accessibility, customization, and transparency, remains under-utilized in both medical and engineering education. The work systematically evaluates the performance of open-source LLMs (DeepSeek, GLM-4, Kimi) and close-source GPT-4 in assisting medical and engineering students learning through diverse question types. We found that DeepSeek outperformed other models for all question types, achieving the highest accuracy rates. To further improve LLM-generated responses, prompt engineering strategies, such as role-playing, generated knowledge prompting, chain-of-thought prompting, few-shot prompting, and output style, were introduced. Post-training evaluations showed significant improvements in model accuracy, with DeepSeek exceeding 95% accuracy for all question types. Among them, Short-answer questions achieved the best response, with the accuracy rate reach up to 97% across four LLMs, indicating the important role of prompt engineering in problem-solving task. The findings highlight the potential of open-source models in supporting medical and engineering education, bridging a critical gap in open-source LLM evaluation and advocating for their wider integration into academic settings.

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12834738/full.md

---
Source: https://tomesphere.com/paper/PMC12834738