A Data Efficient End-To-End Spoken Language Understanding Architecture

Marco Dinarelli; Nikita Kapoor; Bassam Jabaian; and Laurent Besacier

arXiv:2002.05955·cs.CL·February 17, 2020·5 cites

A Data Efficient End-To-End Spoken Language Understanding Architecture

Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, and Laurent Besacier

PDF

Open Access

TL;DR

This paper presents a data-efficient end-to-end spoken language understanding system that trains sequentially without external modules, achieving competitive results on complex semantic tasks with limited data.

Contribution

It introduces a novel incremental training approach for end-to-end SLU that does not rely on pre-trained external models, reducing data requirements.

Findings

01

Achieves 24.02% CER on MEDIA/test without external data

02

Uses a small training dataset for competitive performance

03

Employs sequential training of acoustic, language, and semantic models

Abstract

End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very good results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy. For that, in many cases, models are combined with an external language model to enhance their performance. In this paper we introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. One key feature of our approach is an incremental training procedure where acoustic, language and semantic models are trained sequentially one after the other. The proposed model has a reasonable size and achieves competitive results with respect to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques