Speech To Semantics: Improve ASR and NLU Jointly via All-Neural   Interfaces

Milind Rao; Anirudh Raju; Pranav Dheram; Bach Bui; Ariya Rastrow

arXiv:2008.06173·cs.CL·February 16, 2021

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow

PDF

TL;DR

This paper explores methods for joint speech understanding and recognition, proposing models that directly extract intent from speech, generate transcripts then interpret, and a fully integrated neural interface, aiming to improve accuracy and efficiency in voice assistant systems.

Contribution

It introduces a neural interface for joint ASR and NLU, and compares direct, compositional, and fully joint models for speech understanding.

Findings

01

Joint models improve ASR accuracy with semantic information.

02

Neural interfaces enhance NLU by leveraging ASR confusion.

03

End-to-end joint SLU models outperform separate systems.

Abstract

We consider the problem of spoken language understanding (SLU) of extracting natural language intents and associated slot arguments or named entities from speech that is primarily directed at voice assistants. Such a system subsumes both automatic speech recognition (ASR) as well as natural language understanding (NLU). An end-to-end joint SLU model can be built to a required specification opening up the opportunity to deploy on hardware constrained scenarios like devices enabling voice assistants to work offline, in a privacy preserving manner, whilst also reducing server costs. We first present models that extract utterance intent directly from speech without intermediate text output. We then present a compositional model, which generates the transcript using the Listen Attend Spell ASR system and then extracts interpretation using a neural NLU model. Finally, we contrast these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.