A Data Efficient End-To-End Spoken Language Understanding Architecture
Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, and Laurent Besacier

TL;DR
This paper presents a data-efficient end-to-end spoken language understanding system that trains sequentially without external modules, achieving competitive results on complex semantic tasks with limited data.
Contribution
It introduces a novel incremental training approach for end-to-end SLU that does not rely on pre-trained external models, reducing data requirements.
Findings
Achieves 24.02% CER on MEDIA/test without external data
Uses a small training dataset for competitive performance
Employs sequential training of acoustic, language, and semantic models
Abstract
End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very good results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy. For that, in many cases, models are combined with an external language model to enhance their performance. In this paper we introduce a data efficient system which is trained end-to-end, with no additional, pre-trained external module. One key feature of our approach is an incremental training procedure where acoustic, language and semantic models are trained sequentially one after the other. The proposed model has a reasonable size and achieves competitive results with respect to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
