Efficient Inference of Sub-Item Id-based Sequential Recommendation   Models with Millions of Items

Aleksandr V. Petrov; Craig Macdonald; Nicola Tonellotto

arXiv:2408.09992·cs.IR·August 20, 2024

Efficient Inference of Sub-Item Id-based Sequential Recommendation Models with Millions of Items

Aleksandr V. Petrov, Craig Macdonald, Nicola Tonellotto

PDF

TL;DR

This paper enhances the inference efficiency of large-scale Transformer-based sequential recommendation models by applying the PQTopK algorithm to RecJPQ, achieving significant speedups and enabling practical deployment with millions of items.

Contribution

It introduces the application of the PQTopK algorithm to RecJPQ-based models, significantly improving inference speed for large item catalogues.

Findings

01

RecJPQ reduces memory consumption up to 50x.

02

Applying PQTopK speeds up SASRec by 4.5x on large datasets.

03

PQTopK remains efficient with catalogues of tens of millions of items.

Abstract

Transformer-based recommender systems, such as BERT4Rec or SASRec, achieve state-of-the-art results in sequential recommendation. However, it is challenging to use these models in production environments with catalogues of millions of items: scaling Transformers beyond a few thousand items is problematic for several reasons, including high model memory consumption and slow inference. In this respect, RecJPQ is a state-of-the-art method of reducing the models' memory consumption; RecJPQ compresses item catalogues by decomposing item IDs into a small number of shared sub-item IDs. Despite reporting the reduction of memory consumption by a factor of up to 50x, the original RecJPQ paper did not report inference efficiency improvements over the baseline Transformer-based models. Upon analysing RecJPQ's scoring algorithm, we find that its efficiency is limited by its use of score accumulators…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings