Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation
Gaurav Singh, Zahra Sabet, John Shawe-Taylor, James Thomas

TL;DR
This paper introduces a method to generate artificial labeled data using label descriptions to improve biomedical text tagging, especially in low-resource settings, demonstrating significant performance gains on PICO annotation.
Contribution
The authors propose a novel data augmentation strategy that constructs artificial training instances from label descriptions, enhancing fine-tuning for low-resource biomedical text tagging tasks.
Findings
Achieved state-of-the-art results on PICO annotation.
Significant improvements over baseline methods.
Effective data augmentation for low-resource biomedical NLP.
Abstract
Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. `Appearance of lesion on brain imaging'. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label's name) as input for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Radiomics and Machine Learning in Medical Imaging
