A Dynamic Knowledge Distillation Method Based on the Gompertz Curve
Han Yang, Guangjun Qin

TL;DR
This paper proposes Gompertz-CNN, a dynamic knowledge distillation method that models student learning progression with the Gompertz curve, improving knowledge transfer efficiency and accuracy over traditional methods.
Contribution
It introduces a stage-aware distillation strategy using the Gompertz growth model to adaptively weight distillation losses during training.
Findings
Achieves up to 8% accuracy improvement on CIFAR-10.
Outperforms traditional distillation methods on multiple architectures.
Demonstrates effective modeling of learning progression with the Gompertz curve.
Abstract
This paper introduces a novel dynamic knowledge distillation framework, Gompertz-CNN, which integrates the Gompertz growth model into the training process to address the limitations of traditional knowledge distillation. Conventional methods often fail to capture the evolving cognitive capacity of student models, leading to suboptimal knowledge transfer. To overcome this, we propose a stage-aware distillation strategy that dynamically adjusts the weight of distillation loss based on the Gompertz curve, reflecting the student's learning progression: slow initial growth, rapid mid-phase improvement, and late-stage saturation. Our framework incorporates Wasserstein distance to measure feature-level discrepancies and gradient matching to align backward propagation behaviors between teacher and student models. These components are unified under a multi-loss objective, where the Gompertz…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
