Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Ryan Culkin; J. Edward Hu; Elias Stengel-Eskin; Guanghui Qin; Benjamin; Van Durme

arXiv:2007.00320·cs.CL·July 2, 2020

Iterative Paraphrastic Augmentation with Discriminative Span Alignment

Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin, Van Durme

PDF

TL;DR

This paper presents a novel paraphrastic augmentation method that leverages sentence-level lexically constrained paraphrasing and span alignment to significantly expand language resources efficiently.

Contribution

It introduces a new framework for large-scale resource expansion using minimal manual data and demonstrates its effectiveness on the Berkeley FrameNet Project.

Findings

01

Generated 495,300 unique (Frame, Trigger) annotations, 50x larger than previous resources.

02

Achieved resource expansion with only four days of training data collection.

03

Demonstrated rapid and scalable resource creation for language understanding tasks.

Abstract

We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.