Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding
Yutai Hou, Yijia Liu, Wanxiang Che, Ting Liu

TL;DR
This paper introduces a sequence-to-sequence data augmentation method for dialogue language understanding that leverages semantic alternatives and diversity ranking to improve model performance with limited data.
Contribution
The paper proposes a novel data augmentation framework using sequence-to-sequence generation with diversity rank, enhancing dialogue understanding with limited training data.
Findings
Achieved 6.38 and 10.04 F-score improvements on two datasets.
Generated diverse utterances that improve language understanding.
Effective with only hundreds of training utterances.
Abstract
In this paper, we study the problem of data augmentation for language understanding in task-oriented dialogue system. In contrast to previous work which augments an utterance without considering its relation with other utterances, we propose a sequence-to-sequence generation based data augmentation framework that leverages one utterance's same semantic alternatives in the training data. A novel diversity rank is incorporated into the utterance representation to make the model produce diverse utterances and these diversely augmented utterances help to improve the language understanding module. Experimental results on the Airline Travel Information System dataset and a newly created semantic frame annotation on Stanford Multi-turn, Multidomain Dialogue Dataset show that our framework achieves significant improvements of 6.38 and 10.04 F-scores respectively when only a training set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
