Conceptualized Representation Learning for Chinese Biomedical Text Mining
Ningyu Zhang, Qianghuai Jia, Kangping Yin, Liang Dong, Feng Gao,, Nengwei Hua

TL;DR
This paper introduces a novel conceptualized representation learning method tailored for Chinese biomedical text, enhancing the performance of pre-trained language models on specialized biomedical datasets, and provides a new benchmark for evaluation.
Contribution
It proposes a new conceptualized representation learning approach for Chinese biomedical text and releases a benchmark dataset, ChineseBLUE, for evaluating biomedical language understanding.
Findings
Our approach significantly improves model performance on ChineseBLUE.
Pre-trained models adapted with our method outperform standard models.
The ChineseBLUE benchmark facilitates future research in Chinese biomedical NLP.
Abstract
Biomedical text mining is becoming increasingly important as the number of biomedical documents and web data rapidly grows. Recently, word representation models such as BERT has gained popularity among researchers. However, it is difficult to estimate their performance on datasets containing biomedical texts as the word distributions of general and biomedical corpora are quite different. Moreover, the medical domain has long-tail concepts and terminologies that are difficult to be learned via language models. For the Chinese biomedical text, it is more difficult due to its complex structure and the variety of phrase combinations. In this paper, we investigate how the recently introduced pre-trained language model BERT can be adapted for Chinese biomedical corpora and propose a novel conceptualized representation learning approach. We also release a new Chinese Biomedical Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsLinear Layer · Attention Dropout · Weight Decay · Adam · Dropout · WordPiece · Multi-Head Attention · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax
