Adaptive Weighting in Knowledge Distillation: An Axiomatic Framework for Multi-Scale Teacher Ensemble Optimization

Aaron R. Flouro; Shawn P. Chadwick

arXiv:2601.17910·cs.LG·January 27, 2026

Adaptive Weighting in Knowledge Distillation: An Axiomatic Framework for Multi-Scale Teacher Ensemble Optimization

Aaron R. Flouro, Shawn P. Chadwick

PDF

Open Access

TL;DR

This paper introduces an axiomatic framework for adaptive weighting in multi-teacher knowledge distillation, enabling principled analysis and design of weighting schemes across token, task, and context scales.

Contribution

It formalizes structural conditions for adaptive weighting operators, analyzes their properties, and decouples theoretical guarantees from specific formulas, advancing multi-teacher distillation theory.

Findings

01

Established existence and non-uniqueness of conforming operators

02

Analyzed convergence of gradient-based optimization

03

Provided robustness and safety analysis for distillation methods

Abstract

Knowledge distillation with multiple teachers is increasingly used to improve robustness, efficiency, and safety, yet existing approaches rely largely on heuristic or implementation-specific weighting schemes. This paper develops an operator-agnostic axiomatic framework for adaptive weighting in multi-teacher knowledge distillation across three complementary scales: token, task, and context. We formalize structural conditions under which adaptive weighting operators are well-defined, admit multiple non-equivalent implementations, and can be hierarchically composed via product-structure normalization. Within this framework, we establish existence and non-uniqueness of conforming operators, characterize convergence of gradient-based optimization under standard assumptions, analyze stability and perturbation robustness, and provide an abstract formulation of safety-constrained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Neural Networks and Applications · Advanced Multi-Objective Optimization Algorithms