Data Augmentation Techniques for Chinese Disease Name Normalization
Wenqian Cui, Xiangling Fu, Shaohui Liu, Mingjun Gu, Xien, Liu, Ji Wu, Irwin King

TL;DR
This paper introduces a novel data augmentation approach for Chinese disease name normalization, significantly improving performance in low-data scenarios by addressing training data scarcity.
Contribution
The paper proposes a new data augmentation method specifically designed for Chinese disease normalization, enhancing model performance with limited training data.
Findings
Significant performance improvements across baseline models
Effective in scenarios with limited training data
Versatile augmentation techniques for disease normalization
Abstract
Disease name normalization is an important task in the medical domain. It classifies disease names written in various formats into standardized names, serving as a fundamental component in smart healthcare systems for various disease-related functions. Nevertheless, the most significant obstacle to existing disease name normalization systems is the severe shortage of training data. Consequently, we present a novel data augmentation approach that includes a series of data augmentation techniques and some supporting modules to help mitigate the problem. Through extensive experimentation, we illustrate that our proposed approach exhibits significant performance improvements across various baseline models and training objectives, particularly in scenarios with limited training data
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Biomedical Text Mining and Ontologies
