End-to-end architectures for ASR-free spoken language understanding

Elisavet Palogiannidi; Ioannis Gkinis; George Mastrapas; Petr Mizera,; Themos Stafylakis

arXiv:1910.10599·eess.AS·May 4, 2020·ICASSP

End-to-end architectures for ASR-free spoken language understanding

Elisavet Palogiannidi, Ioannis Gkinis, George Mastrapas, Petr Mizera,, Themos Stafylakis

PDF

TL;DR

This paper presents recurrent end-to-end neural architectures for spoken language understanding that achieve state-of-the-art intent classification on the FSC dataset without relying on ASR or pretrained models.

Contribution

The study introduces a set of recurrent architectures combined with data augmentation for end-to-end SLU, eliminating the need for ASR-level targets or pretrained ASR models.

Findings

01

Achieves state-of-the-art intent classification results on FSC dataset.

02

Models generalize reasonably well to unseen wordings.

03

Data augmentation enhances model performance.

Abstract

Spoken Language Understanding (SLU) is the problem of extracting the meaning from speech utterances. It is typically addressed as a two-step problem, where an Automatic Speech Recognition (ASR) model is employed to convert speech into text, followed by a Natural Language Understanding (NLU) model to extract meaning from the decoded text. Recently, end-to-end approaches were emerged, aiming at unifying the ASR and NLU into a single SLU deep neural architecture, trained using combinations of ASR and NLU-level recognition units. In this paper, we explore a set of recurrent architectures for intent classification, tailored to the recently introduced Fluent Speech Commands (FSC) dataset, where intents are formed as combinations of three slots (action, object, and location). We show that by combining deep recurrent architectures with standard data augmentation, state-of-the-art results can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.