Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and   language Models for Intent Classification

Bidisha Sharma; Maulik Madhavi; Haizhou Li

arXiv:2102.07370·cs.CL·February 16, 2021

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

Bidisha Sharma, Maulik Madhavi, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a novel intent classification framework that combines acoustic features from speech recognition models with linguistic features from language models, using knowledge distillation and cross-attention, achieving high accuracy on benchmark datasets.

Contribution

It proposes a new method that integrates acoustic and linguistic embeddings for intent classification, leveraging pretrained models and knowledge distillation for improved performance.

Findings

01

Achieved 90.86% accuracy on ATIS dataset.

02

Achieved 99.07% accuracy on Fluent speech corpus.

03

Demonstrated effectiveness of combining acoustic and linguistic features.

Abstract

Intent classification is a task in spoken language understanding. An intent classification system is usually implemented as a pipeline process, with a speech recognition module followed by text processing that classifies the intents. There are also studies of end-to-end system that takes acoustic features as input and classifies the intents directly. Such systems don't take advantage of relevant linguistic information, and suffer from limited training data. In this work, we propose a novel intent classification framework that employs acoustic features extracted from a pretrained speech recognition system and linguistic features learned from a pretrained language model. We use knowledge distillation technique to map the acoustic embeddings towards linguistic embeddings. We perform fusion of both acoustic and linguistic embeddings through cross-attention approach to classify intents. With…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsKnowledge Distillation