End-to-End Spoken Language Understanding for Generalized Voice   Assistants

Michael Saxon; Samridhi Choudhary; Joseph P. McKenna; Athanasios; Mouchtaris

arXiv:2106.09009·cs.CL·October 8, 2021

End-to-End Spoken Language Understanding for Generalized Voice Assistants

Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios, Mouchtaris

PDF

TL;DR

This paper introduces a fully differentiable, transformer-based end-to-end SLU system for voice assistants that improves accuracy on complex and unseen intent-argument combinations, advancing the capabilities of commercial VAs.

Contribution

The authors develop a hierarchical, pretrained, end-to-end model for generalized SLU that handles diverse intents and arguments, outperforming baselines on complex datasets.

Findings

01

43% accuracy improvement over baselines on internal dataset

02

Meets 99% accuracy on Fluent Speech Commands

03

Nearly 20% improvement on unseen slot arguments

Abstract

End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model. Previous work in this area has focused on targeted tasks in fixed domains, where the output semantic structure is assumed a priori and the input speech is of limited complexity. In this work we present our approach to developing an E2E model for generalized SLU in commercial voice assistants (VAs). We propose a fully differentiable, transformer-based, hierarchical system that can be pretrained at both the ASR and NLU levels. This is then fine-tuned on both transcription and semantic classification losses to handle a diverse set of intent and argument combinations. This leads to an SLU system that achieves significant improvements over baselines on a complex internal generalized VA dataset with a 43% improvement in accuracy, while still meeting the 99%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.