TL;DR
This paper introduces two novel finetuning strategies for pretrained language models to generate diverse augmented data, significantly improving slot filling in spoken language understanding systems.
Contribution
It proposes value-based and context-based augmentation methods leveraging pretrained models, enhancing data diversity for SLU with limited labeled data.
Findings
Improved SLU performance on two datasets
Generated more diverse training sentences
Outperformed existing augmentation methods
Abstract
Spoken Language Understanding (SLU) is one essential step in building a dialogue system. Due to the expensive cost of obtaining the labeled data, SLU suffers from the data scarcity problem. Therefore, in this paper, we focus on data augmentation for slot filling task in SLU. To achieve that, we aim at generating more diverse data based on existing data. Specifically, we try to exploit the latent language knowledge from pretrained language models by finetuning them. We propose two strategies for finetuning process: value-based and context-based augmentation. Experimental results on two public SLU datasets have shown that compared with existing data augmentation methods, our proposed method can generate more diverse sentences and significantly improve the performance on SLU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
