Adaptive Weighting in Knowledge Distillation: An Axiomatic Framework for Multi-Scale Teacher Ensemble Optimization
Aaron R. Flouro, Shawn P. Chadwick

TL;DR
This paper introduces an axiomatic framework for adaptive weighting in multi-teacher knowledge distillation, enabling principled analysis and design of weighting schemes across token, task, and context scales.
Contribution
It formalizes structural conditions for adaptive weighting operators, analyzes their properties, and decouples theoretical guarantees from specific formulas, advancing multi-teacher distillation theory.
Findings
Established existence and non-uniqueness of conforming operators
Analyzed convergence of gradient-based optimization
Provided robustness and safety analysis for distillation methods
Abstract
Knowledge distillation with multiple teachers is increasingly used to improve robustness, efficiency, and safety, yet existing approaches rely largely on heuristic or implementation-specific weighting schemes. This paper develops an operator-agnostic axiomatic framework for adaptive weighting in multi-teacher knowledge distillation across three complementary scales: token, task, and context. We formalize structural conditions under which adaptive weighting operators are well-defined, admit multiple non-equivalent implementations, and can be hierarchically composed via product-structure normalization. Within this framework, we establish existence and non-uniqueness of conforming operators, characterize convergence of gradient-based optimization under standard assumptions, analyze stability and perturbation robustness, and provide an abstract formulation of safety-constrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Neural Networks and Applications · Advanced Multi-Objective Optimization Algorithms
