Data Augmentation for Voice-Assistant NLU using BERT-based   Interchangeable Rephrase

Akhila Yerukola; Mason Bretan; Hongxia Jin

arXiv:2104.08268·cs.CL·April 19, 2021

Data Augmentation for Voice-Assistant NLU using BERT-based Interchangeable Rephrase

Akhila Yerukola, Mason Bretan, Hongxia Jin

PDF

Open Access

TL;DR

This paper presents a novel data augmentation method using BERT-based rephrasing to improve spoken language understanding in voice assistants, outperforming other techniques in classification accuracy and naturalness.

Contribution

The authors introduce a BERT-based rephrasing augmentation technique that enhances voice assistant NLU performance beyond existing methods.

Findings

01

Our method improves domain and intent classification accuracy.

02

It outperforms VAEs, synonym replacement, and back-translation.

03

User study shows increased utterance naturalness and semantic consistency.

Abstract

We introduce a data augmentation technique based on byte pair encoding and a BERT-like self-attention model to boost performance on spoken language understanding tasks. We compare and evaluate this method with a range of augmentation techniques encompassing generative models such as VAEs and performance-boosting techniques such as synonym replacement and back-translation. We show our method performs strongly on domain and intent classification tasks for a voice assistant and in a user-study focused on utterance naturalness and semantic similarity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis