BioAug: Conditional Generation based Data Augmentation for Low-Resource Biomedical NER
Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, Dinesh Manocha

TL;DR
BioAug is a novel data augmentation framework for low-resource biomedical NER, leveraging a BART-based model trained on a text reconstruction task to generate diverse, factual augmentations that improve NER performance.
Contribution
The paper introduces BioAug, a new conditional generation-based data augmentation method specifically designed for low-resource biomedical NER tasks, addressing the lack of high-quality labeled data.
Findings
BioAug outperforms baseline methods with 1.5%-21.5% accuracy improvement.
It generates more factual and diverse augmentations.
Effective across 5 benchmark BioNER datasets.
Abstract
Biomedical Named Entity Recognition (BioNER) is the fundamental task of identifying named entities from biomedical text. However, BioNER suffers from severe data scarcity and lacks high-quality labeled data due to the highly specialized and expert knowledge required for annotation. Though data augmentation has shown to be highly effective for low-resource NER in general, existing data augmentation techniques fail to produce factual and diverse augmentations for BioNER. In this paper, we present BioAug, a novel data augmentation framework for low-resource BioNER. BioAug, built on BART, is trained to solve a novel text reconstruction task based on selective masking and knowledge augmentation. Post training, we perform conditional generation and generate diverse augmentations conditioning BioAug on selectively corrupted text similar to the training stage. We demonstrate the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · fail · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding
