Open- and closed-source LLMs in medical and engineering education
Liping Sun, Ya Li, Hongxing Kan, Jianhua Shu, Huanqing Xu, Chengle Li, Guokun Shi, Ziyang Wang, Xueqi Wang, Li Jin

TL;DR
This paper compares open-source and closed-source large language models in medical and engineering education, finding that open-source models like DeepSeek perform well and can be improved with prompt engineering.
Contribution
The study introduces and evaluates prompt engineering strategies to enhance open-source LLMs for educational tasks in medical and engineering fields.
Findings
DeepSeek outperformed other models across all question types with the highest accuracy rates.
Prompt engineering significantly improved model accuracy, with DeepSeek exceeding 95% accuracy for all question types.
Short-answer questions achieved up to 97% accuracy across four LLMs, highlighting the effectiveness of prompt engineering in problem-solving tasks.
Abstract
The rapid development of large language models (LLMs), such as the close-source GPT-4, have revolutionized education in assisting students learning. However, open-source LLMs, which have many advantages of accessibility, customization, and transparency, remains under-utilized in both medical and engineering education. The work systematically evaluates the performance of open-source LLMs (DeepSeek, GLM-4, Kimi) and close-source GPT-4 in assisting medical and engineering students learning through diverse question types. We found that DeepSeek outperformed other models for all question types, achieving the highest accuracy rates. To further improve LLM-generated responses, prompt engineering strategies, such as role-playing, generated knowledge prompting, chain-of-thought prompting, few-shot prompting, and output style, were introduced. Post-training evaluations showed significant…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Text Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning
