# Tailored knowledge distillation with automated loss function learning

**Authors:** Sheng Ran, Tao Huang, Wuyue Yang

PMC · DOI: 10.1371/journal.pone.0325599 · PLOS One · 2025-06-11

## TL;DR

This paper introduces a new method for model compression that automatically learns optimal distillation losses, improving performance across various tasks.

## Contribution

The novel Learnable Knowledge Distillation (LKD) framework autonomously learns adaptive distillation losses using bi-level optimization.

## Key findings

- LKD outperforms state-of-the-art KD methods on datasets like CIFAR and ImageNet.
- The method achieves 73.62% accuracy on ImageNet with MobileNet, surpassing baselines by 2.94%.

## Abstract

Knowledge Distillation (KD) is one of the most effective and widely used methods for model compression of large models. It has achieved significant success with the meticulous development of distillation losses. However, most state-of-the-art KD losses are manually crafted and task-specific, raising questions about their contribution to distillation efficacy. This paper unveils Learnable Knowledge Distillation (LKD), a novel approach that autonomously learns adaptive, performance-driven distillation losses. LKD revolutionizes KD by employing a bi-level optimization strategy and an iterative optimization that differentiably learns distillation losses aligned with the students’ validation loss. Building upon our proposed generic loss networks for logits and intermediate features, we derive a dynamic optimization strategy to adjust losses based on the student models’ changing states for enhanced performance and adaptability. Additionally, for a more robust loss, we introduce a uniform sampling of diverse previously-trained student models to train the loss with various convergence rates of predictions. With the more universally adaptable distillation framework of LKD, we conduct experiments on various datasets such as CIFAR and ImageNet, demonstrating our superior performance without the need for task-specific adjustments. For example, our LKD achieves 73.62% accuracy with the MobileNet model on ImageNet, significantly surpassing our KD baseline by 2.94%.

## Full-text entities

- **Diseases:** distillation loss (MESH:D016388)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12157245/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12157245/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12157245/full.md

---
Source: https://tomesphere.com/paper/PMC12157245