CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Ningyu Zhang; Mosha Chen; Zhen Bi; Xiaozhuan Liang; Lei Li; Xin Shang,; Kangping Yin; Chuanqi Tan; Jian Xu; Fei Huang; Luo Si; Yuan Ni; Guotong Xie,; Zhifang Sui; Baobao Chang; Hui Zong; Zheng Yuan; Linfeng Li; Jun Yan,; Hongying Zan; Kunli Zhang; Buzhou Tang; Qingcai Chen

arXiv:2106.08087·cs.CL·November 2, 2022·24 cites

CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark

Ningyu Zhang, Mosha Chen, Zhen Bi, Xiaozhuan Liang, Lei Li, Xin Shang,, Kangping Yin, Chuanqi Tan, Jian Xu, Fei Huang, Luo Si, Yuan Ni, Guotong Xie,, Zhifang Sui, Baobao Chang, Hui Zong, Zheng Yuan, Linfeng Li, Jun Yan,, Hongying Zan, Kunli Zhang, Buzhou Tang, Qingcai Chen

PDF

Open Access 2 Repos

TL;DR

CBLUE is the first comprehensive Chinese biomedical language understanding benchmark, providing diverse tasks and evaluation platform to advance AI research in Chinese medical NLP, revealing current models lag behind human performance.

Contribution

This paper introduces CBLUE, a new Chinese biomedical NLP benchmark with multiple tasks and an evaluation platform, filling a gap in non-English biomedical AI research.

Findings

01

Current models perform significantly worse than humans on CBLUE tasks.

02

The benchmark covers diverse biomedical NLP tasks including NER, information extraction, and classification.

03

Empirical results highlight the need for improved Chinese biomedical language models.

Abstract

Artificial Intelligence (AI), along with the recent progress in biomedical language understanding, is gradually changing medical practice. With the development of biomedical language understanding benchmarks, AI applications are widely used in the medical field. However, most benchmarks are limited to English, which makes it challenging to replicate many of the successes in English for other languages. To facilitate research in this direction, we collect real-world biomedical data and present the first Chinese Biomedical Language Understanding Evaluation (CBLUE) benchmark: a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification, and an associated online platform for model evaluation, comparison, and analysis. To establish evaluation on these tasks, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques

MethodsALBERT · RoBERTa · BERT