Aligning Logits Generatively for Principled Black-Box Knowledge   Distillation

Jing Ma; Xiang Xiang; Ke Wang; Yuchuan Wu; Yongbin Li

arXiv:2205.10490·cs.LG·April 2, 2024

Aligning Logits Generatively for Principled Black-Box Knowledge Distillation

Jing Ma, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel knowledge distillation method called MEKD that aligns logits generatively, addressing challenges in black-box model compression for cloud-to-edge applications with limited data access.

Contribution

It formalizes a two-step workflow and proposes a new optimization direction from logits to cell boundary, improving distillation performance over previous methods.

Findings

01

Outperforms previous state-of-the-art on various benchmarks.

02

Effectively handles soft and hard responses without differentiation.

03

Demonstrates strong results in black-box model compression scenarios.

Abstract

Black-Box Knowledge Distillation (B2KD) is a formulated problem for cloud-to-edge model compression with invisible data and models hosted on the server. B2KD faces challenges such as limited Internet exchange and edge-cloud disparity of data distributions. In this paper, we formalize a two-step workflow consisting of deprivatization and distillation, and theoretically provide a new optimization direction from logits to cell boundary different from direct logits alignment. With its guidance, we propose a new method Mapping-Emulation KD (MEKD) that distills a black-box cumbersome model into a lightweight one. Our method does not differentiate between treating soft or hard responses, and consists of: 1) deprivatization: emulating the inverse mapping of the teacher function with a generator, and 2) distillation: aligning low-dimensional logits of the teacher and student models by reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haiv-lab/mekd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning

MethodsKnowledge Distillation