Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Yijun Dong, Kevin Miller, Qi Lei, Rachel Ward

TL;DR
This paper provides a theoretical analysis of relational knowledge distillation (RKD) in semi-supervised learning, showing it provably learns low-error clusterings and offers label-efficient semi-supervised classification.
Contribution
It introduces a spectral clustering perspective for RKD, establishes a clustering error bound, and unifies RKD with data augmentation regularization within a cluster-aware framework.
Findings
RKD provably reduces clustering error in semi-supervised learning.
Sample complexity bounds for RKD with limited unlabeled data.
RKD offers a global spectral clustering perspective complementing local regularization methods.
Abstract
Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various knowledge distillation paradigms. In this work, we take an initial step toward a theoretical understanding of relational knowledge distillation (RKD), with a focus on semi-supervised classification problems. We start by casting RKD as spectral clustering on a population-induced graph unveiled by a teacher model. Via a notion of clustering error that quantifies the discrepancy between the predicted and ground truth clusterings, we illustrate that RKD over the population provably leads to low clustering error. Moreover, we provide a sample complexity bound for RKD with limited unlabeled samples. For semi-supervised learning, we further demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBayesian Modeling and Causal Inference · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
MethodsFocus · Spectral Clustering · Knowledge Distillation
