Scaling Generative Recommendations with Context Parallelism on Hierarchical Sequential Transducers

Yue Dong; Han Li; Shen Li; Nikhil Patel; Xing Liu; Xiaodong Wang; Chuanhao Zhuge

arXiv:2508.04711·cs.IR·August 19, 2025

Scaling Generative Recommendations with Context Parallelism on Hierarchical Sequential Transducers

Yue Dong, Han Li, Shen Li, Nikhil Patel, Xing Liu, Xiaodong Wang, Chuanhao Zhuge

PDF

Open Access

TL;DR

This paper introduces context parallelism with jagged tensor support for Hierarchical Sequential Transducers, enabling significant scaling of user interaction sequence length in recommendation systems while maintaining efficiency.

Contribution

It presents a novel context parallelism method tailored for jagged tensors in HSTU, allowing scalable sequence modeling in large-scale recommendation systems.

Findings

01

Achieved a 5.3x increase in supported sequence length.

02

Realized a 1.55x scaling factor with Distributed Data Parallelism.

03

Enhanced the ability to model longer user histories effectively.

Abstract

Large-scale recommendation systems are pivotal to process an immense volume of daily user interactions, requiring the effective modeling of high cardinality and heterogeneous features to ensure accurate predictions. In prior work, we introduced Hierarchical Sequential Transducers (HSTU), an attention-based architecture for modeling high cardinality, non-stationary streaming recommendation data, providing good scaling law in the generative recommender framework (GR). Recent studies and experiments demonstrate that attending to longer user history sequences yields significant metric improvements. However, scaling sequence length is activation-heavy, necessitating parallelism solutions to effectively shard activation memory. In transformer-based LLMs, context parallelism (CP) is a commonly used technique that distributes computation along the sequence-length dimension across multiple GPUs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Topic Modeling · Speech Recognition and Synthesis