HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression
Gustavo Coelho Haase, Paulo Henrique Dourado da Silva

TL;DR
HPM-KD is an advanced knowledge distillation framework that automates hyperparameter tuning, efficiently combines multiple teachers, and accelerates training, achieving significant compression with minimal accuracy loss.
Contribution
The paper introduces HPM-KD, a novel multi-component framework that automates hyperparameter tuning, enhances multi-teacher distillation, and improves computational efficiency for model compression.
Findings
Achieves 10x-15x model compression with 85% accuracy retention.
Reduces training time by 30-40% through parallel processing.
Eliminates manual hyperparameter tuning via meta-learning.
Abstract
Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large teachers to small students, (3) suboptimal coordination in multi-teacher scenarios, and (4) inefficient use of computational resources. We present \textbf{HPM-KD}, a framework that integrates six synergistic components: (i) Adaptive Configuration Manager via meta-learning that eliminates manual hyperparameter tuning, (ii) Progressive Distillation Chain with automatically determined intermediate models, (iii) Attention-Weighted Multi-Teacher Ensemble that learns dynamic per-sample weights, (iv) Meta-Learned Temperature Scheduler that adapts temperature throughout training, (v) Parallel Processing Pipeline with intelligent load balancing, and (vi) Shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Computational Physics and Python Applications
