Self-training with Two-phase Self-augmentation for Few-shot Dialogue Generation
Wanyu Du, Hanjie Chen, Yangfeng Ji

TL;DR
This paper introduces a two-phase self-augmentation method for few-shot dialogue generation that selects informative data and aggregates multiple representations to improve response quality, outperforming existing methods.
Contribution
The paper proposes a novel two-phase self-augmentation approach that enhances pseudo-label quality for few-shot dialogue systems, addressing noise and informativeness issues in self-training.
Findings
Outperforms existing self-training methods on benchmark datasets
Achieves higher automatic evaluation scores
Receives better human evaluation results
Abstract
In task-oriented dialogue systems, response generation from meaning representations (MRs) often suffers from limited training examples, due to the high cost of annotating MR-to-Text pairs. Previous works on self-training leverage fine-tuned conversational models to automatically generate pseudo-labeled MR-to-Text pairs for further fine-tuning. However, some self-augmented data may be noisy or uninformative for the model to learn from. In this work, we propose a two-phase self-augmentation procedure to generate high-quality pseudo-labeled MR-to-Text pairs: the first phase selects the most informative MRs based on model's prediction uncertainty; with the selected MRs, the second phase generates accurate responses by aggregating multiple perturbed latent representations from each MR. Empirical experiments on two benchmark datasets, FewShotWOZ and FewShotSGD, show that our method generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
