DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task
Honglin Xiong, Sheng Wang, Yitao Zhu, Zihao Zhao, Yuxiao Liu, Linlin, Huang, Qian Wang, Dinggang Shen

TL;DR
DoctorGLM is a cost-effective, Chinese medical dialogue model fine-tuned from ChatGLM-6B, aiming to improve healthcare AI accessibility and performance with a quick, affordable training process.
Contribution
This work demonstrates fine-tuning a large Chinese language model for medical dialogue tasks using accessible hardware, making healthcare AI development more feasible for hospitals.
Findings
Fine-tuned ChatGLM-6B on medical dialogues in 13 hours
Achieved affordable healthcare-specific LLM training
Shared initial model for community feedback
Abstract
The recent progress of large language models (LLMs), including ChatGPT and GPT-4, in comprehending and responding to human instructions has been remarkable. Nevertheless, these models typically perform better in English and have not been explicitly trained for the medical domain, resulting in suboptimal precision in diagnoses, drug recommendations, and other medical advice. Additionally, training and deploying a dialogue model is still believed to be impossible for hospitals, hindering the promotion of LLMs. To tackle these challenges, we have collected databases of medical dialogues in Chinese with ChatGPT's help and adopted several techniques to train an easy-deploy LLM. Remarkably, we were able to fine-tune the ChatGLM-6B on a single A100 80G in 13 hours, which means having a healthcare-purpose LLM can be very affordable. DoctorGLM is currently an early-stage engineering attempt and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections
