Using Weak Supervision and Data Augmentation in Question Answering
Chumki Basu, Himanshu Garg, Allen McIntosh, Sezai Sablak, John R., Wullert II

TL;DR
This paper investigates how weak supervision and data augmentation techniques can improve question answering models in the biomedical domain, especially during early COVID-19 research when annotated data was scarce.
Contribution
It introduces methods for generating training labels and QA pairs automatically using information retrieval, and applies curriculum learning for domain adaptation in biomedical QA.
Findings
Weak supervision signals from structured abstracts improve QA training.
Data augmentation with linguistic features enhances model robustness.
Curriculum learning aids in effective domain adaptation for COVID-19 QA.
Abstract
The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions. During the early days of the pandemic, one of the biggest challenges we faced was the lack of peer-reviewed biomedical articles on COVID-19 that could be used to train machine learning models for question answering (QA). In this paper, we explore the roles weak supervision and data augmentation play in training deep neural network QA models. First, we investigate whether labels generated automatically from the structured abstracts of scholarly papers using an information retrieval algorithm, BM25, provide a weak supervision signal to train an extractive QA model. We also curate new QA pairs using information retrieval techniques, guided by the clinicaltrials.gov schema and the structured abstracts of articles, in the absence of annotated data from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
