PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken   Language Understanding

Trang Le; Daniel Lazar; Suyoun Kim; Shan Jiang; Duc Le; Adithya Sagar,; Aleksandr Livshits; Ahmed Aly; Akshat Shrivastava

arXiv:2406.07823·cs.CL·June 13, 2024

PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar,, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

PDF

Open Access

TL;DR

PRoDeliberation introduces a non-autoregressive, parallel decoding approach for spoken language understanding that significantly reduces latency while maintaining robustness against speech recognition errors.

Contribution

It presents a novel CTC-based decoding and denoising training method for non-autoregressive SLU models, improving speed and robustness over previous autoregressive systems.

Findings

01

Achieves 2-10x latency reduction compared to autoregressive models.

02

Retains ability to correct ASR mistranscriptions.

03

Overcomes limitations of small ASR devices through denoising training.

Abstract

Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a novel method leveraging a Connectionist Temporal Classification-based decoding strategy as well as a denoising objective to train robust non-autoregressive deliberation models. We show that PRoDeliberation achieves the latency reduction of parallel decoding (2-10x improvement over autoregressive models) while retaining the ability to correct Automatic Speech Recognition (ASR) mistranscriptions of autoregressive deliberation systems. We further show that the design of the denoising training allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics