AGent: A Novel Pipeline for Automatically Creating Unanswerable   Questions

Son Quoc Tran; Gia-Huy Do; Phong Nguyen-Thuan Do; Matt Kretchmar,; Xinya Du

arXiv:2309.05103·cs.CL·September 12, 2023

AGent: A Novel Pipeline for Automatically Creating Unanswerable Questions

Son Quoc Tran, Gia-Huy Do, Phong Nguyen-Thuan Do, Matt Kretchmar,, Xinya Du

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces AGent, an automated pipeline for generating unanswerable questions to improve extractive question answering models, reducing manual annotation efforts and maintaining high data quality.

Contribution

AGent is a novel automated method for creating unanswerable questions, enhancing dataset generation for EQA without manual annotation.

Findings

01

Created unanswerable question sets with low error rates

02

Models trained on these sets perform comparably to those trained on SQuAD 2.0

03

Demonstrated effectiveness on multiple EQA benchmarks

Abstract

The development of large high-quality datasets and high-performing models have led to significant advancements in the domain of Extractive Question Answering (EQA). This progress has sparked considerable interest in exploring unanswerable questions within the EQA domain. Training EQA models with unanswerable questions helps them avoid extracting misleading or incorrect answers for queries that lack valid responses. However, manually annotating unanswerable questions is labor-intensive. To address this, we propose AGent, a novel pipeline that automatically creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer. In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA. These created question sets exhibit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sonqt/agent-unanswerable
noneOfficial

Datasets

sonquoctran/SQuAD_AGent
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications