A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP

Shinnosuke Ono; Issey Sukeda; Takuro Fujii; Kosei Buma; Shunsuke Sasaki

arXiv:2505.16661·cs.CL·September 10, 2025

A Japanese Language Model and Three New Evaluation Benchmarks for Pharmaceutical NLP

Shinnosuke Ono, Issey Sukeda, Takuro Fujii, Kosei Buma, Shunsuke Sasaki

PDF

Open Access 1 Repo 6 Models

TL;DR

This paper introduces a Japanese pharmaceutical domain-specific language model trained on extensive biomedical data, along with three new benchmarks for evaluating pharmaceutical NLP tasks, demonstrating strong performance and highlighting ongoing challenges in consistency reasoning.

Contribution

The work presents a novel Japanese pharmaceutical language model and three comprehensive benchmarks for evaluating pharmaceutical NLP capabilities.

Findings

01

The model outperforms open-source models on pharmaceutical tasks.

02

It achieves competitive results with commercial models like GPT-4o.

03

Cross-sentence consistency remains a challenging area for current models.

Abstract

We present a Japanese domain-specific language model for the pharmaceutical field, developed through continual pretraining on 2 billion Japanese pharmaceutical tokens and 8 billion English biomedical tokens. To enable rigorous evaluation, we introduce three new benchmarks: YakugakuQA, based on national pharmacist licensing exams; NayoseQA, which tests cross-lingual synonym and terminology normalization; and SogoCheck, a novel task designed to assess consistency reasoning between paired statements. We evaluate our model against both open-source medical LLMs and commercial models, including GPT-4o. Results show that our domain-specific model outperforms existing open models and achieves competitive performance with commercial ones, particularly on terminology-heavy and knowledge-based tasks. Interestingly, even GPT-4o performs poorly on SogoCheck, suggesting that cross-sentence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eques-inc/pharma-llm-eval
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Text Readability and Simplification