Clozer: Adaptable Data Augmentation for Cloze-style Reading   Comprehension

Holy Lovenia; Bryan Wilie; Willy Chung; Min Zeng; Samuel Cahyawijaya,; Su Dan; Pascale Fung

arXiv:2203.16027·cs.CL·September 13, 2022

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya,, Su Dan, Pascale Fung

PDF

TL;DR

Clozer introduces a flexible sequence-tagging method for data augmentation in cloze-style reading comprehension, significantly improving task performance by better extracting answer spans without relying on heuristics.

Contribution

The paper presents Clozer, a novel extendable sequence-tagging approach for cloze answer extraction that enhances task-adaptive pre-training for reading comprehension.

Findings

01

Clozer outperforms existing methods and the oracle in MRC tasks.

02

Clozer effectively recognizes gold answers independently of heuristics.

03

Significant performance gains in multiple-choice cloze-style MRC tasks.

Abstract

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task. Unfortunately, existing adaptations mainly involve deterministic rules that cannot generalize well. Here, we propose Clozer, a sequence-tagging based cloze answer extraction method used in TAPT that is extendable for adaptation on any cloze-style machine reading comprehension (MRC) downstream tasks. We experiment on multiple-choice cloze-style MRC tasks, and show that Clozer performs significantly better compared to the oracle and state-of-the-art in escalating TAPT effectiveness in lifting model performance, and prove that Clozer is able to recognize the gold answers independently of any heuristics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.