Combining Data Generation and Active Learning for Low-Resource Question   Answering

Maximilian Kimmich; Andrea Bartezzaghi; Jasmina Bogojeska; Cristiano; Malossi; Ngoc Thang Vu

arXiv:2211.14880·cs.CL·September 16, 2024·1 cites

Combining Data Generation and Active Learning for Low-Resource Question Answering

Maximilian Kimmich, Andrea Bartezzaghi, Jasmina Bogojeska, Cristiano, Malossi, Ngoc Thang Vu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a combined data augmentation and active learning approach to improve question answering performance in low-resource, domain-specific settings with minimal human annotation effort.

Contribution

It proposes a novel method integrating question-answer generation with active learning to reduce annotation costs in low-resource QA domains.

Findings

01

Boosts QA performance with minimal labeled data

02

Effective human annotation strategies depend on the stage of active learning

03

Enhances low-resource domain adaptation for question answering

Abstract

Neural approaches have become very popular in Question Answering (QA), however, they require a large amount of annotated data. In this work, we propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low-resource settings, where the target domains are diverse in terms of difficulty and similarity to the source domain. We also investigate Active Learning for question answering in different stages, overall reducing the annotation effort of humans. For this purpose, we consider target domains in realistic settings, with an extremely low amount of annotated samples but with many unlabeled documents, which we assume can be obtained with little effort. Additionally, we assume a sufficient amount of labeled data from the source domain being available. We perform extensive experiments to find the best setup for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mxschmdt/mrqa-gen-al
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Domain Adaptation and Few-Shot Learning