BIOMRC: A Dataset for Biomedical Machine Reading Comprehension
Petros Stavropoulos, Dimitris Pappas, Ion Androutsopoulos, Ryan, McDonald

TL;DR
BIOMRC is a large, cleaner biomedical machine reading comprehension dataset that enables better model performance, surpassing previous datasets and even matching biomedical expert accuracy in some cases.
Contribution
The paper introduces BIOMRC, a new large-scale biomedical MRC dataset with reduced noise, along with a BERT-based model that outperforms existing methods and approaches expert-level accuracy.
Findings
Neural models perform significantly better on BIOMRC than on BIOREAD.
The new BERT-based model surpasses previous methods and approaches expert performance.
The dataset is available in three sizes with accompanying code and leaderboard.
Abstract
We introduce BIOMRC, a large-scale cloze-style biomedical MRC dataset. Care was taken to reduce noise, compared to the previous BIOREAD dataset of Pappas et al. (2018). Experiments show that simple heuristics do not perform well on the new dataset, and that two neural MRC models that had been tested on BIOREAD perform much better on BIOMRC, indicating that the new dataset is indeed less noisy or at least that its task is more feasible. Non-expert human performance is also higher on the new dataset compared to BIOREAD, and biomedical experts perform even better. We also introduce a new BERT-based MRC model, the best version of which substantially outperforms all other methods tested, reaching or surpassing the accuracy of biomedical experts in some experiments. We make the new dataset available in three different sizes, also releasing our code, and providing a leaderboard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics
