Technical Report: Small Language Model for Japanese Clinical and Medicine
Shogo Watanabe

TL;DR
This report introduces NCVC-slm-1, a small 1-billion-parameter Japanese language model specialized in clinical and medical texts, demonstrating high performance on multiple medical NLP tasks.
Contribution
The paper presents a high-quality, small Japanese clinical language model with specialized preprocessing, showing superior performance on medical NLP tasks compared to larger models.
Findings
NCVC-slm-1 achieved top scores on 6 out of 8 tasks.
The model effectively handles clinical and medical Japanese text.
It demonstrates the feasibility of small models in medical NLP applications.
Abstract
This report presents a small language model (SLM) for Japanese clinical and medicine, named NCVC-slm-1. This 1B parameters model was trained using Japanese text classified to be of high-quality. Moreover, NCVC-slm-1 was augmented with respect to clinical and medicine content that includes the variety of diseases, drugs, and examinations. Using a carefully designed pre-processing, a specialized morphological analyzer and tokenizer, this small and light-weight model performed not only to generate text but also indicated the feasibility of understanding clinical and medicine text. In comparison to other large language models, a fine-tuning NCVC-slm-1 demonstrated the highest scores on 6 tasks of total 8 on JMED-LLM. According to this result, SLM indicated the feasibility of performing several downstream tasks in the field of clinical and medicine. Hopefully, NCVC-slm-1 will be contributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
