TL;DR
Distill2Vec introduces a knowledge distillation approach to create a compact, efficient dynamic graph embedding model that maintains high accuracy while significantly reducing inference latency and model size.
Contribution
The paper proposes a novel distillation loss for training small models for dynamic graphs, achieving high accuracy with fewer parameters and lower inference latency.
Findings
Outperforms state-of-the-art methods with up to 5% better link prediction accuracy.
Achieves a compression ratio of up to 7:100 compared to baseline models.
Demonstrates reduced online inference latency with a smaller model size.
Abstract
Dynamic graph representation learning strategies are based on different neural architectures to capture the graph evolution over time. However, the underlying neural architectures require a large amount of parameters to train and suffer from high online inference latency, that is several model parameters have to be updated when new data arrive online. In this study we propose Distill2Vec, a knowledge distillation strategy to train a compact model with a low number of trainable parameters, so as to reduce the latency of online inference and maintain the model accuracy high. We design a distillation loss function based on Kullback-Leibler divergence to transfer the acquired knowledge from a teacher model trained on offline data, to a small-size student model for online data. Our experiments with publicly available datasets show the superiority of our proposed model over several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation
