Cooperative Self-training of Machine Reading Comprehension

Hongyin Luo; Shang-Wen Li; Mingye Gao; Seunghak Yu; James Glass

arXiv:2103.07449·cs.CL·June 29, 2022

Cooperative Self-training of Machine Reading Comprehension

Hongyin Luo, Shang-Wen Li, Mingye Gao, Seunghak Yu, James Glass

PDF

Open Access 1 Repo

TL;DR

This paper introduces RGX, a cooperative self-training framework that automatically generates question-answer pairs to enhance machine reading comprehension models, reducing reliance on annotated data and outperforming existing state-of-the-art methods.

Contribution

The paper presents a novel self-training framework, RGX, combining question generation and answer extraction to improve question answering without needing annotated datasets.

Findings

01

RGX outperforms SOTA pretrained models on standard benchmarks.

02

RGX achieves new SOTA performance with limited model size.

03

The framework enables training on unannotated text corpora.

Abstract

Pretrained language models have significantly improved the performance of downstream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings. However, training question answering models still requires large amounts of annotated data for specific domains. In this work, we propose a cooperative self-training framework, RGX, for automatically generating more non-trivial question-answer pairs to improve model performance. RGX is built upon a masked answer extraction task with an interactive learning environment containing an answer entity Recognizer, a question Generator, and an answer eXtractor. Given a passage with a masked entity, the generator generates a question around the entity, and the extractor is trained to extract the masked entity with the generated question and raw texts. The framework allows the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luohongyin/RGX
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications