Hessian-aware Quantized Node Embeddings for Recommendation
Huiyuan Chen, Kaixiong Zhou, Kwei-Herng Lai, Chin-Chia Michael Yeh,, Yan Zheng, Xia Hu, Hao Yang

TL;DR
This paper introduces HQ-GNN, a Hessian-aware quantized graph neural network that compresses node embeddings into low-bit representations, reducing memory and inference time while maintaining high recommendation accuracy.
Contribution
The paper proposes a novel Hessian-aware quantization method for GNNs that improves gradient stability and performance in discrete node embedding representations.
Findings
HQ-GNN reduces memory usage and inference latency.
It achieves comparable recommendation accuracy to full-precision GNNs.
The method demonstrates effectiveness on large-scale datasets.
Abstract
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in recommender systems. Nevertheless, the process of searching and ranking from a large item corpus usually requires high latency, which limits the widespread deployment of GNNs in industry-scale applications. To address this issue, many methods compress user/item representations into the binary embedding space to reduce space requirements and accelerate inference. Also, they use the Straight-through Estimator (STE) to prevent vanishing gradients during back-propagation. However, the STE often causes the gradient mismatch problem, leading to sub-optimal results. In this work, we present the Hessian-aware Quantized GNN (HQ-GNN) as an effective solution for discrete representations of users/items that enable fast retrieval. HQ-GNN is composed of two components: a GNN encoder for learning continuous node embeddings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
