Improving End-to-End Models for Set Prediction in Spoken Language   Understanding

Hong-Kwang J. Kuo; Zoltan Tuske; Samuel Thomas; Brian Kingsbury,; George Saon

arXiv:2201.12105·cs.CL·January 31, 2022

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Hong-Kwang J. Kuo, Zoltan Tuske, Samuel Thomas, Brian Kingsbury,, George Saon

PDF

Open Access

TL;DR

This paper enhances end-to-end spoken language understanding models for set prediction by proposing a data augmentation and alignment method, significantly improving F1 scores especially when entity spoken order is unknown.

Contribution

It introduces a novel data augmentation and implicit attention alignment technique to improve E2E SLU models handling unordered entity sets.

Findings

01

F1 scores increased by over 11% for RNN transducers.

02

F1 scores increased by about 2% for attention-based encoder-decoders.

03

Proposed methods outperform previous results in set prediction accuracy.

Abstract

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible to train solely on semantic entities, which are far cheaper to collect than verbatim transcripts. We focus on this set prediction problem, where entity order is unspecified. Using two classes of E2E models, RNN transducers and attention based encoder-decoders, we show that these models work best when the training entity sequence is arranged in spoken order. To improve E2E SLU models when entity spoken order is unknown, we propose a novel data augmentation technique along with an implicit attention based alignment method to infer the spoken order. F1 scores significantly increased by more than 11% for RNN-T and about 2% for attention based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Speech and dialogue systems