Sequence-to-Sequence Data Augmentation for Dialogue Language   Understanding

Yutai Hou; Yijia Liu; Wanxiang Che; Ting Liu

arXiv:1807.01554·cs.CL·July 5, 2018·26 cites

Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding

Yutai Hou, Yijia Liu, Wanxiang Che, Ting Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a sequence-to-sequence data augmentation method for dialogue language understanding that leverages semantic alternatives and diversity ranking to improve model performance with limited data.

Contribution

The paper proposes a novel data augmentation framework using sequence-to-sequence generation with diversity rank, enhancing dialogue understanding with limited training data.

Findings

01

Achieved 6.38 and 10.04 F-score improvements on two datasets.

02

Generated diverse utterances that improve language understanding.

03

Effective with only hundreds of training utterances.

Abstract

In this paper, we study the problem of data augmentation for language understanding in task-oriented dialogue system. In contrast to previous work which augments an utterance without considering its relation with other utterances, we propose a sequence-to-sequence generation based data augmentation framework that leverages one utterance's same semantic alternatives in the training data. A novel diversity rank is incorporated into the utterance representation to make the model produce diverse utterances and these diversely augmented utterances help to improve the language understanding module. Experimental results on the Airline Travel Information System dataset and a newly created semantic frame annotation on Stanford Multi-turn, Multidomain Dialogue Dataset show that our framework achieves significant improvements of 6.38 and 10.04 F-scores respectively when only a training set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AtmaHou/Seq2SeqDataAugmentationForLU
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques