JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language   Models

Junfeng Jiang; Jiahao Huang; Akiko Aizawa

arXiv:2409.13317·cs.CL·September 23, 2024

JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models

Junfeng Jiang, Jiahao Huang, Akiko Aizawa

PDF

Open Access 2 Datasets

TL;DR

JMedBench introduces a comprehensive benchmark with datasets and evaluation tools to assess and compare Japanese biomedical large language models, highlighting current strengths and areas for improvement.

Contribution

This paper presents the first large-scale benchmark for Japanese biomedical LLMs, including datasets, evaluation tools, and insights for future development.

Findings

01

Better Japanese understanding correlates with higher performance.

02

Non-specialized LLMs can perform well in biomedical tasks.

03

Significant room for improvement in Japanese biomedical LLMs.

Abstract

Recent developments in Japanese large language models (LLMs) primarily focus on general domains, with fewer advancements in Japanese biomedical LLMs. One obstacle is the absence of a comprehensive, large-scale benchmark for comparison. Furthermore, the resources for evaluating Japanese biomedical LLMs are insufficient. To advance this field, we propose a new benchmark including eight LLMs across four categories and 20 Japanese biomedical datasets across five tasks. Experimental results indicate that: (1) LLMs with a better understanding of Japanese and richer biomedical knowledge achieve better performance in Japanese biomedical tasks, (2) LLMs that are not mainly designed for Japanese biomedical domains can still perform unexpectedly well, and (3) there is still much room for improving the existing LLMs in certain Japanese biomedical tasks. Moreover, we offer insights that could…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling

MethodsFocus