Temporal Knowledge Distillation for On-device Audio Classification
Kwanghee Choi, Martin Kersner, Jacob Morton, and Buru Chang

TL;DR
This paper introduces a novel temporal knowledge distillation method that transfers temporal information from large transformer models to on-device models, enhancing audio classification performance while maintaining architecture flexibility.
Contribution
A new knowledge distillation technique that captures temporal information via attention weights and applies to various architectures, improving on-device audio classification.
Findings
Improved accuracy on audio event detection and keyword spotting datasets.
Effective transfer of temporal knowledge to non-attention models.
Applicable to multiple architecture types without changing inference models.
Abstract
Improving the performance of on-device audio classification models remains a challenge given the computational limits of the mobile environment. Many studies leverage knowledge distillation to boost predictive performance by transferring the knowledge from large models to on-device models. However, most lack a mechanism to distill the essence of the temporal information, which is crucial to audio classification tasks, or similar architecture is often required. In this paper, we propose a new knowledge distillation method designed to incorporate the temporal knowledge embedded in attention weights of large transformer-based models into on-device models. Our distillation method is applicable to various types of architectures, including the non-attention-based architectures such as CNNs or RNNs, while retaining the original network architecture during inference. Through extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech and Audio Processing
MethodsKnowledge Distillation
