Iterative Paraphrastic Augmentation with Discriminative Span Alignment
Ryan Culkin, J. Edward Hu, Elias Stengel-Eskin, Guanghui Qin, Benjamin, Van Durme

TL;DR
This paper presents a novel paraphrastic augmentation method that leverages sentence-level lexically constrained paraphrasing and span alignment to significantly expand language resources efficiently.
Contribution
It introduces a new framework for large-scale resource expansion using minimal manual data and demonstrates its effectiveness on the Berkeley FrameNet Project.
Findings
Generated 495,300 unique (Frame, Trigger) annotations, 50x larger than previous resources.
Achieved resource expansion with only four days of training data collection.
Demonstrated rapid and scalable resource creation for language understanding tasks.
Abstract
We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
