Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase
Akhila Yerukola, Mason Bretan, Hongxia Jin

TL;DR
This paper presents a novel data augmentation method using BERT-based rephrasing to improve spoken language understanding in voice assistants, outperforming other techniques in classification accuracy and naturalness.
Contribution
The authors introduce a BERT-based rephrasing augmentation technique that enhances voice assistant NLU performance beyond existing methods.
Findings
Our method improves domain and intent classification accuracy.
It outperforms VAEs, synonym replacement, and back-translation.
User study shows increased utterance naturalness and semantic consistency.
Abstract
We introduce a data augmentation technique based on byte pair encoding and a BERT-like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
