C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot   Filling

Yutai Hou; Sanyuan Chen; Wanxiang Che; Cheng Chen; Ting Liu

arXiv:2012.07004·cs.CL·December 15, 2020·1 cites

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Yutai Hou, Sanyuan Chen, Wanxiang Che, Cheng Chen, Ting Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

C2C-GenDA is a novel data augmentation framework that jointly encodes and generates multiple semantically similar utterances to improve slot filling performance in spoken language understanding tasks.

Contribution

It introduces a cluster-to-cluster generation approach that enhances diversity and reduces duplication in data augmentation for slot filling.

Findings

01

Improves slot filling F-score by up to 13.6% on ATIS and Snips datasets.

02

Effectively enlarges training data with diverse, semantically consistent utterances.

03

Demonstrates significant gains with limited training data.

Abstract

Slot filling, a fundamental module of spoken language understanding, often suffers from insufficient quantity and diversity of training data. To remedy this, we propose a novel Cluster-to-Cluster generation framework for Data Augmentation (DA), named C2C-GenDA. It enlarges the training set by reconstructing existing utterances into alternative expressions while keeping semantic. Different from previous DA works that reconstruct utterances one by one independently, C2C-GenDA jointly encodes multiple existing utterances of the same semantics and simultaneously decodes multiple unseen expressions. Jointly generating multiple new utterances allows to consider the relations between generated instances and encourages diversity. Besides, encoding multiple existing utterances endows C2C with a wider view of existing expressions, helping to reduce generation that duplicates existing data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sanyuan-Chen/C2C-DA
pytorchOfficial

Videos

C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems