Deliberation Model for On-Device Spoken Language Understanding

Duc Le; Akshat Shrivastava; Paden Tomasello; Suyoun Kim; Aleksandr; Livshits; Ozlem Kalinli; Michael L. Seltzer

arXiv:2204.01893·cs.CL·September 8, 2022

Deliberation Model for On-Device Spoken Language Understanding

Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr, Livshits, Ozlem Kalinli, Michael L. Seltzer

PDF

Open Access

TL;DR

This paper introduces a deliberation-based end-to-end spoken language understanding system that improves accuracy and robustness by combining ASR and NLU with shared parameters, suitable for on-device deployment.

Contribution

It presents a novel deliberation model that integrates ASR and NLU with shared parameters, supporting complex semantics and robustness in resource-constrained environments.

Findings

01

Outperforms pipeline NLU baselines by 0.60-0.65% on TOPv2 dataset

02

Fusion of text and audio features enhances robustness to ASR errors

03

Reduces performance degradation when using synthetic speech for training

Abstract

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings. By formulating E2E SLU as a generalized decoder, our system is able to support complex compositional semantic structures. Furthermore, the sharing of parameters between ASR and NLU makes the system especially suitable for resource-constrained (on-device) environments; our proposed approach consistently outperforms strong pipeline NLU baselines by 0.60% to 0.65% on the spoken version of the TOPv2 dataset (STOP). We demonstrate that the fusion of text and audio features, coupled with the system's ability to rewrite the first-pass hypothesis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing