Multi-Teacher Ensemble Distillation: A Mathematical Framework for Probability-Domain Knowledge Aggregation
Aaron R. Flouro, Shawn P. Chadwick

TL;DR
This paper introduces an axiomatic, operator-theoretic framework for multi-teacher ensemble knowledge distillation, providing theoretical guarantees and multiple aggregation strategies without prescribing a specific formula.
Contribution
It develops a novel axiomatic framework for knowledge aggregation in multi-teacher distillation, proving existence, non-uniqueness, and theoretical guarantees of various aggregation operators.
Findings
Multi-teacher aggregation reduces variance and bias.
Multiple valid aggregation operators satisfy core axioms.
Classical variance reduction results extend to correlated-error regimes.
Abstract
Building on the probability-domain distillation framework of Sparse-KD, we develop an axiomatic, operator-theoretic framework for multi-teacher ensemble knowledge distillation. Rather than prescribing a specific aggregation formula, we define five core axioms governing valid knowledge aggregation operators, encompassing convexity, positivity, continuity, weight monotonicity, and temperature coherence. We prove the existence and non-uniqueness of operator families satisfying these axioms, establishing that multiple distinct aggregation mechanisms conform to the same foundational principles. Within this framework, we establish operator-agnostic guarantees showing that multi-teacher aggregation reduces both stochastic variance and systematic supervisory bias under heterogeneous teachers, while providing Jensen-type bounds, log-loss guarantees, and safety attenuation properties. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Mobile Crowdsensing and Crowdsourcing · Distributed Sensor Networks and Detection Algorithms
