Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items
Aleksandr V. Petrov, Craig Macdonald, Nicola Tonellotto

TL;DR
This paper enhances the inference efficiency of large-scale Transformer-based sequential recommendation models by applying the PQTopK algorithm to RecJPQ, achieving significant speedups and enabling practical deployment with millions of items.
Contribution
It introduces the application of the PQTopK algorithm to RecJPQ-based models, significantly improving inference speed for large item catalogues.
Findings
RecJPQ reduces memory consumption up to 50x.
Applying PQTopK speeds up SASRec by 4.5x on large datasets.
PQTopK remains efficient with catalogues of tens of millions of items.
Abstract
Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models' memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50x, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. Upon analysing RecJPQ's scoring algorithm, we find that its efficiency is limited by its use of score accumulators…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
