End-to-end spoken language understanding using transformer networks and   self-supervised pre-trained features

Edmilson Morais; Hong-Kwang J. Kuo; Samuel Thomas; Zoltan Tuske and; Brian Kingsbury

arXiv:2011.08238·cs.CL·November 18, 2020

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske and, Brian Kingsbury

PDF

TL;DR

This paper demonstrates that self-supervised pre-trained acoustic features significantly improve end-to-end spoken language understanding with transformer networks, especially when combined with multi-task training, reducing reliance on pre-trained model initialization.

Contribution

It introduces a modular E2E SLU transformer architecture that effectively integrates self-supervised pre-trained acoustic features and multi-task training, advancing SLU performance.

Findings

01

Self-supervised features outperform filterbank features in SLU tasks.

02

Multi-task training with self-supervised features reduces the need for pre-trained model initialization.

03

The approach achieves state-of-the-art results on the ATIS dataset.

Abstract

Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.