Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
Pavel Denisov, Ngoc Thang Vu

TL;DR
This paper introduces a cross-modal teacher-student training approach that leverages pretrained speech and text embeddings to develop end-to-end spoken language understanding systems, reducing error propagation and improving performance.
Contribution
It presents a novel method combining pretrained speech recognition and semantic embeddings within a teacher-student framework for end-to-end spoken language understanding.
Findings
Achieves comparable performance to pipeline systems without training data.
Outperforms pipeline systems after minimal fine-tuning on two benchmarks.
Demonstrates effectiveness across three benchmark datasets.
Abstract
Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps. These components are optimized independently to allow usage of available data, but the overall system suffers from error propagation. In this paper, we propose a novel training method that enables pretrained contextual embeddings to process acoustic features. In particular, we extend it with an encoder of pretrained speech recognition systems in order to construct end-to-end spoken language understanding systems. Our proposed method is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces. Experimental results in three benchmarks show that our system reaches the performance comparable to the pipeline architecture without using any training data and outperforms it after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Multi-Head Attention · Residual Connection · Attention Is All You Need · Attention Dropout · Weight Decay · Adam · Softmax · WordPiece · Dense Connections
