Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

TL;DR
This paper introduces a novel knowledge distillation method called MEKD that aligns logits generatively, addressing challenges in black-box model compression for cloud-to-edge applications with limited data access.
Contribution
It formalizes a two-step workflow and proposes a new optimization direction from logits to cell boundary, improving distillation performance over previous methods.
Findings
Outperforms previous state-of-the-art on various benchmarks.
Effectively handles soft and hard responses without differentiation.
Demonstrates strong results in black-box model compression scenarios.
Abstract
Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation
