Combining Data Generation and Active Learning for Low-Resource Question Answering
Maximilian Kimmich, Andrea Bartezzaghi, Jasmina Bogojeska, Cristiano, Malossi, Ngoc Thang Vu

TL;DR
This paper introduces a combined data augmentation and active learning approach to improve question answering performance in low-resource, domain-specific settings with minimal human annotation effort.
Contribution
It proposes a novel method integrating question-answer generation with active learning to reduce annotation costs in low-resource QA domains.
Findings
Boosts QA performance with minimal labeled data
Effective human annotation strategies depend on the stage of active learning
Enhances low-resource domain adaptation for question answering
Abstract
Neural approaches have become very popular in Question Answering (QA), however, they require a large amount of annotated data. In this work, we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low-resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume a sufficient amount of labeled data from the source domain being available. We perform extensive experiments to find the best setup for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Domain Adaptation and Few-Shot Learning
