TL;DR
This paper explores augmenting QA datasets with synthetic answerable and unanswerable questions to improve model performance, demonstrating that unanswerable questions notably boost accuracy in a transformer-based model.
Contribution
It introduces a method of using synthetic unanswerable questions to enhance QA datasets, showing significant performance gains over traditional datasets.
Findings
Unanswerable questions improve F1 scores by 5.0%.
Synthetic data augmentation yields a 6.7% F1 score increase.
Unanswerable questions are more effective than answerable ones for model training.
Abstract
Question Answering (QA) is key for making possible a robust communication between human and machine. Modern language models used for QA have surpassed the human-performance in several essential tasks; however, these models require large amounts of human-generated training data which are costly and time-consuming to create. This paper studies augmenting human-made datasets with synthetic data as a way of surmounting this problem. A state-of-the-art model based on deep transformers is used to inspect the impact of using synthetic answerable and unanswerable questions to complement a well-known human-made dataset. The results indicate a tangible improvement in the performance of the language model (measured in terms of F1 and EM scores) trained on the mixed dataset. Specifically, unanswerable question-answers prove more effective in boosting the model: the F1 score gain from adding to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
