TL;DR
This paper investigates how transformer-based models scale in sequential recommendation tasks, revealing scaling laws similar to language models and providing guidelines for efficient training and deployment in real-world systems.
Contribution
It demonstrates the applicability of scaling laws to sequential recommendation with transformers and offers a compute-optimal training strategy based on large-scale Amazon data.
Findings
Scaling behaviors similar to language models are observed in recommendation.
Performance improvements transfer effectively to downstream tasks.
Guidelines for compute-efficient training and inference are provided.
Abstract
Modeling user preferences has been mainly addressed by looking at users' interaction history with the different elements available in the system. Tailoring content to individual preferences based on historical data is the main goal of sequential recommendation. The nature of the problem, as well as the good performance observed across various domains, has motivated the use of the transformer architecture, which has proven effective in leveraging increasingly larger amounts of training data when accompanied by an increase in the number of model parameters. This scaling behavior has brought a great deal of attention, as it provides valuable guidance in the design and training of even larger models. Taking inspiration from the scaling laws observed in training large language models, we explore similar principles for sequential recommendation. We use the full Amazon Product Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
