AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss   Weighting

Shreyan Ganguly; Roshan Nayak; Rakshith Rao; Ujan Deb; Prathosh AP

arXiv:2405.08019·cs.LG·May 15, 2024

AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting

Shreyan Ganguly, Roshan Nayak, Rakshith Rao, Ujan Deb, Prathosh AP

PDF

Open Access

TL;DR

This paper introduces AdaKD, an adaptive loss weighting method for knowledge distillation in ASR models, which dynamically adjusts weights at the instance level based on sample difficulty, improving performance over traditional methods.

Contribution

The paper presents a novel adaptive loss weighting technique inspired by curriculum learning that enhances knowledge distillation by considering sample difficulty at the instance level.

Findings

01

AdaKD outperforms conventional knowledge distillation methods.

02

The adaptive weighting improves model performance across tasks.

03

The method is compatible with various task-specific and distillation objectives.

Abstract

Knowledge distillation, a widely used model compression technique, works on the basis of transferring knowledge from a cumbersome teacher model to a lightweight student model. The technique involves jointly optimizing the task specific and knowledge distillation losses with a weight assigned to them. Despite these weights playing a crucial role in the performance of the distillation process, current methods provide equal weight to both losses, leading to suboptimal performance. In this paper, we propose Adaptive Knowledge Distillation, a novel technique inspired by curriculum learning to adaptively weigh the losses at instance level. This technique goes by the notion that sample difficulty increases with teacher loss. Our method follows a plug-and-play paradigm that can be applied on top of any task-specific and distillation objectives. Experiments show that our method performs better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Fuzzy Logic and Control Systems

MethodsKnowledge Distillation