Unsupervised Pre-training for Biomedical Question Answering
Vaishnavi Kommaraju, Karthick Gunasekaran, Kun Li, Trapit Bansal,, Andrew McCallum, Ivana Williams, Ana-Maria Istrate

TL;DR
This paper introduces a novel pre-training task for biomedical text that improves question answering performance by teaching models to identify corrupted biomedical entity mentions, leading to significant performance gains.
Contribution
The paper proposes a new de-noising pre-training task for biomedical language models that enhances their ability to answer biomedical questions.
Findings
Pre-training with the new task improves QA performance.
BioBERT outperforms previous models on BioASQ challenge.
The method reduces train-test mismatch in biomedical QA.
Abstract
We explore the suitability of unsupervised representation learning methods on biomedical text -- BioBERT, SciBERT, and BioSentVec -- for biomedical question answering. To further improve unsupervised representations for biomedical QA, we introduce a new pre-training task from unlabeled data designed to reason about biomedical entities in the context. Our pre-training method consists of corrupting a given context by randomly replacing some mention of a biomedical entity with a random entity mention and then querying the model with the correct entity mention in order to locate the corrupted part of the context. This de-noising task enables the model to learn good representations from abundant, unlabeled biomedical text that helps QA tasks and minimizes the train-test mismatch between the pre-training task and the downstream QA tasks by requiring the model to predict spans. Our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Advanced Text Analysis Techniques
