Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali; Beiye Liu; Liwei Cai; Konstantine Arkoudas; Chengwei; Su; and Wael Hamza

arXiv:2012.08549·cs.CL·December 17, 2020

Exploring Transfer Learning For End-to-End Spoken Language Understanding

Subendhu Rongali, Beiye Liu, Liwei Cai, Konstantine Arkoudas, Chengwei, Su, and Wael Hamza

PDF

1 Video

TL;DR

This paper introduces the AT-AT model, a multi-task end-to-end system that jointly trains on speech and text tasks, improving SLU performance and enabling zero-shot capabilities, with state-of-the-art results on multiple datasets.

Contribution

The paper presents a novel multi-task E2E model that leverages both speech and text data, outperforming single-task models and enabling zero-shot SLU without speech data.

Findings

01

Achieves state-of-the-art results on internal and public datasets.

02

Demonstrates effective zero-shot E2E SLU performance.

03

Outperforms models trained on limited data.

Abstract

Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) component to process customer speech and generate text transcriptions, followed by a Natural Language Understanding (NLU) component to map transcriptions to an actionable hypothesis. An end-to-end (E2E) system that goes directly from speech to a hypothesis is a more attractive option. These systems were shown to be smaller, faster, and better optimized. However, they require massive amounts of end-to-end training data and in addition, don't take advantage of the already available ASR and NLU training data. In this work, we propose an E2E system that is designed to jointly train on multiple speech-to-text tasks, such as ASR (speech-transcription) and SLU (speech-hypothesis), and text-to-text tasks, such as NLU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring Transfer Learning for End-to-End Spoken Language Understanding· underline