RNN Transducer Models For Spoken Language Understanding

Samuel Thomas; Hong-Kwang J. Kuo; George Saon; Zolt\'an T\"uske; Brian; Kingsbury; Gakuto Kurata; Zvi Kons; Ron Hoory

arXiv:2104.03842·cs.CL·April 9, 2021

RNN Transducer Models For Spoken Language Understanding

Samuel Thomas, Hong-Kwang J. Kuo, George Saon, Zolt\'an T\"uske, Brian, Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory

PDF

1 Repo

TL;DR

This paper explores the development and adaptation of RNN transducer models for spoken language understanding across various data availability scenarios, demonstrating effective use of synthetic speech and achieving state-of-the-art results.

Contribution

It introduces methods for building and adapting RNN-T SLU models from pre-trained ASR systems in diverse practical settings, including when only labels or synthetic speech are available.

Findings

01

RNN-T SLU models perform comparably to other end-to-end models.

02

Synthetic speech can effectively replace real audio for model adaptation.

03

State-of-the-art results achieved on ATIS and customer call datasets.

Abstract

We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available, a constrained case where the only available annotations are SLU labels and their values, and a more restrictive case where transcripts are available but not corresponding audio. We show how RNN-T SLU models can be developed starting from pre-trained automatic speech recognition (ASR) systems, followed by an SLU adaptation step. In settings where real audio data is not available, artificially synthesized speech is used to successfully adapt various SLU models. When evaluated on two SLU data sets, the ATIS corpus and a customer call center data set, the proposed models closely track the performance of other E2E models and achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CoraJung/flexible-input-slu
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.