TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation

Huichao Chai; Zhixin Wu; Xuemiao Li; Shiqing Fan; Hengfeng Wang; Maojun Peng; Lu Xu; Yaoyuan Wang; Yibo Jin; Wei Guo; Yongxiang Feng

arXiv:2605.13433·cs.DC·May 14, 2026

TurboGR: An Accelerated Training System for Large-Scale Generative Recommendation

Huichao Chai, Zhixin Wu, Xuemiao Li, Shiqing Fan, Hengfeng Wang, Maojun Peng, Lu Xu, Yaoyuan Wang, Yibo Jin, Wei Guo, Yongxiang Feng

PDF

TL;DR

TurboGR introduces an optimized training system for large-scale generative recommendation on Ascend NPUs, overcoming system bottlenecks with innovative acceleration, communication, and negative sampling techniques.

Contribution

The paper presents extit{TurboGR}, a system that systematically addresses Ascend NPU challenges for scalable generative recommendation training with three core innovations.

Findings

01

Supports training up to 0.2B parameters with high efficiency.

02

Achieves 54.71% MFU and near-linear scalability (0.97).

03

Reduces inter-device imbalance from 47% to 2.4%.

Abstract

Generative recommendation (GR) has emerged as a promising paradigm that replaces fragmented, scenario-specific architectures with unified Transformer-based models, exhibiting scaling-law behavior where recommendation quality improves systematically with increased model capacity and training data. However, deploying GR at scale on Ascend NPUs faces fundamental system-level challenges. These challenges are further exacerbated on Ascend NPUs due to the absence of high-performance implementations for jagged operators and the architectural mismatch between irregular sparse primitives and NPU's dense-computation-optimized design. In this paper, we present \model, an Ascend-affinity training system for generative recommendation that systematically addresses these bottlenecks through three core innovations: (i) Ascend-affinity jagged acceleration, including fusion operators that eliminate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.