PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Mike Ranzinger; Jon Barker; Greg Heinrich; Pavlo Molchanov; Bryan; Catanzaro; Andrew Tao

arXiv:2410.01680·cs.LG·October 3, 2024

PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation

Mike Ranzinger, Jon Barker, Greg Heinrich, Pavlo Molchanov, Bryan, Catanzaro, Andrew Tao

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces PHI-S, a novel distribution balancing technique using Hadamard matrices for label-free multi-teacher distillation, improving student model quality by standardizing activation statistics.

Contribution

The paper proposes PHI Standardization (PHI-S), a new method employing Hadamard matrices for isotropic distribution alignment in multi-teacher distillation without labels.

Findings

01

PHI-S outperforms other normalization techniques in student model quality.

02

Hadamard matrices enable effective isotropic standardization of activation distributions.

03

Distribution balancing improves downstream teacher-matching metrics.

Abstract

Various visual foundation models have distinct strengths and weaknesses, both of which can be improved through heterogeneous multi-teacher knowledge distillation without labels, termed "agglomerative models." We build upon this body of work by studying the effect of the teachers' activation statistics, particularly the impact of the loss function on the resulting student model quality. We explore a standard toolkit of statistical normalization techniques to better align the different distributions and assess their effects. Further, we examine the impact on downstream teacher-matching metrics, which motivates the use of Hadamard matrices. With these matrices, we demonstrate useful properties, showing how they can be used for isotropic standardization, where each dimension of a multivariate distribution is standardized using the same scale. We call this technique "PHI Standardization"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvlabs/radio
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration

MethodsKnowledge Distillation · ALIGN