Scaling Generative Recommendations with Context Parallelism on Hierarchical Sequential Transducers
Yue Dong, Han Li, Shen Li, Nikhil Patel, Xing Liu, Xiaodong Wang, Chuanhao Zhuge

TL;DR
This paper introduces context parallelism with jagged tensor support for Hierarchical Sequential Transducers, enabling significant scaling of user interaction sequence length in recommendation systems while maintaining efficiency.
Contribution
It presents a novel context parallelism method tailored for jagged tensors in HSTU, allowing scalable sequence modeling in large-scale recommendation systems.
Findings
Achieved a 5.3x increase in supported sequence length.
Realized a 1.55x scaling factor with Distributed Data Parallelism.
Enhanced the ability to model longer user histories effectively.
Abstract
Large-scale recommendation systems are pivotal to process an immense volume of daily user interactions, requiring the effective modeling of high cardinality and heterogeneous features to ensure accurate predictions. In prior work, we introduced Hierarchical Sequential Transducers (HSTU), an attention-based architecture for modeling high cardinality, non-stationary streaming recommendation data, providing good scaling law in the generative recommender framework (GR). Recent studies and experiments demonstrate that attending to longer user history sequences yields significant metric improvements. However, scaling sequence length is activation-heavy, necessitating parallelism solutions to effectively shard activation memory. In transformer-based LLMs, context parallelism (CP) is a commonly used technique that distributes computation along the sequence-length dimension across multiple GPUs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Speech Recognition and Synthesis
