TL;DR
This paper introduces a bidirectional distillation framework for recommender systems, enabling both teacher and student models to mutually improve, surpassing traditional unidirectional methods and enhancing overall recommendation performance.
Contribution
The paper proposes a novel bidirectional distillation approach with a rank discrepancy-aware sampling scheme, allowing mutual learning and significant performance gains in recommender systems.
Findings
Both teacher and student models improve after bidirectional training.
The proposed method outperforms state-of-the-art unidirectional distillation techniques.
The rank discrepancy-aware sampling effectively handles large performance gaps.
Abstract
Recommender systems (RS) have started to employ knowledge distillation, which is a model compression technique training a compact model (student) with the knowledge transferred from a cumbersome model (teacher). The state-of-the-art methods rely on unidirectional distillation transferring the knowledge only from the teacher to the student, with an underlying assumption that the teacher is always superior to the student. However, we demonstrate that the student performs better than the teacher on a significant proportion of the test set, especially for RS. Based on this observation, we propose Bidirectional Distillation (BD) framework whereby both the teacher and the student collaboratively improve with each other. Specifically, each model is trained with the distillation loss that makes to follow the other's prediction along with its original loss function. For effective bidirectional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
