Data Augmentation Techniques for Chinese Disease Name Normalization

Wenqian Cui; Xiangling Fu; Shaohui Liu; Mingjun Gu; Xien; Liu; Ji Wu; Irwin King

arXiv:2501.01195·cs.CL·January 3, 2025

Data Augmentation Techniques for Chinese Disease Name Normalization

Wenqian Cui, Xiangling Fu, Shaohui Liu, Mingjun Gu, Xien, Liu, Ji Wu, Irwin King

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel data augmentation approach for Chinese disease name normalization, significantly improving performance in low-data scenarios by addressing training data scarcity.

Contribution

The paper proposes a new data augmentation method specifically designed for Chinese disease normalization, enhancing model performance with limited training data.

Findings

01

Significant performance improvements across baseline models

02

Effective in scenarios with limited training data

03

Versatile augmentation techniques for disease normalization

Abstract

Disease name normalization is an important task in the medical domain. It classifies disease names written in various formats into standardized names, serving as a fundamental component in smart healthcare systems for various disease-related functions. Nevertheless, the most significant obstacle to existing disease name normalization systems is the severe shortage of training data. Consequently, we present a novel data augmentation approach that includes a series of data augmentation techniques and some supporting modules to help mitigate the problem. Through extensive experimentation, we illustrate that our proposed approach exhibits significant performance improvements across various baseline models and training objectives, particularly in scenarios with limited training data

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dreamtheater123/disease_name_dataset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTraditional Chinese Medicine Studies · Biomedical Text Mining and Ontologies