Scaling Recommender Transformers to One Billion Parameters

Kirill Khrylchenko; Artem Matveev; Sergei Makeev; Vladimir Baikalov

arXiv:2507.15994·cs.IR·February 19, 2026

Scaling Recommender Transformers to One Billion Parameters

Kirill Khrylchenko, Artem Matveev, Sergei Makeev, Vladimir Baikalov

PDF

Open Access

TL;DR

This paper demonstrates how to train and deploy large-scale transformer recommender models with up to one billion parameters, significantly improving recommendation quality in a real-world music platform.

Contribution

It introduces a scalable training recipe for billion-parameter transformer recommenders and shows effective decomposition of autoregressive learning tasks.

Findings

01

Achieved successful deployment on a large-scale music platform

02

Online A/B tests show +2.26% increase in total listening time

03

User liking likelihood increased by +6.37%

Abstract

While large transformer models have been successfully used in many real-world applications such as natural language processing, computer vision, and speech processing, scaling transformers for recommender systems remains a challenging problem. Recently, Generative Recommenders framework was proposed to scale beyond typical Deep Learning Recommendation Models (DLRMs). Reformulation of recommendation as sequential transduction task led to improvement of scaling properties in terms of compute. Nevertheless, the largest encoder configuration reported by the HSTU authors amounts only to ~176 million parameters, which is considerably smaller than the hundreds of billions or even trillions of parameters common in modern language models. In this work, we present a recipe for training large transformer recommenders with up to a billion parameters. We show that autoregressive learning on user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques