Scene-adaptive Knowledge Distillation for Sequential Recommendation via Differentiable Architecture Search
Lei Chen, Fajie Yuan, Jiaxi Yang, Min Yang, and Chengming Li

TL;DR
This paper introduces AdaRec, a framework that uses differentiable Neural Architecture Search to adaptively compress large sequential recommender models into efficient, scene-specific lightweight architectures with improved speed and maintained accuracy.
Contribution
AdaRec is the first to combine differentiable NAS with scene-adaptive knowledge distillation for sequential recommendation models.
Findings
Achieves competitive or better accuracy than larger models.
Provides significant inference speedup in real-world datasets.
Discovers diverse neural architectures tailored to different recommendation scenes.
Abstract
Sequential recommender systems (SRS) have become a research hotspot due to its power in modeling user dynamic interests and sequential behavioral patterns. To maximize model expressive ability, a default choice is to apply a larger and deeper network architecture, which, however, often brings high network latency when generating online recommendations. Naturally, we argue that compressing the heavy recommendation models into middle- or light- weight neural networks is of great importance for practical production systems. To realize such a goal, we propose AdaRec, a knowledge distillation (KD) framework which compresses knowledge of a teacher model into a student model adaptively according to its recommendation scene by using differentiable Neural Architecture Search (NAS). Specifically, we introduce a target-oriented distillation loss to guide the structure search process for finding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
