Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification
Yidi Jiang, Bidisha Sharma, Maulik Madhavi, and Haizhou Li

TL;DR
This paper introduces a knowledge distillation approach from BERT to a speech transformer model for intent classification, improving accuracy and robustness in speech understanding tasks without relying on large speech datasets.
Contribution
It presents a novel multilevel transformer-based knowledge distillation method from BERT to speech models for intent classification, enhancing performance in low-resource and noisy conditions.
Findings
Achieved 99.10% accuracy on Fluent speech corpus.
Achieved 88.79% accuracy on ATIS database.
Demonstrated improved robustness in acoustically degraded environments.
Abstract
End-to-end intent classification using speech has numerous advantages compared to the conventional pipeline approach using automatic speech recognition (ASR), followed by natural language processing modules. It attempts to predict intent from speech without using an intermediate ASR module. However, such end-to-end framework suffers from the unavailability of large speech resources with higher acoustic variation in spoken language understanding. In this work, we exploit the scope of the transformer distillation method that is specifically designed for knowledge distillation from a transformer based language model to a transformer based speech model. In this regard, we leverage the reliable and widely used bidirectional encoder representations from transformers (BERT) model as a language model and transfer the knowledge to build an acoustic model for intent classification using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsKnowledge Distillation
