Actions Speak Louder than Words: Trillion-Parameter Sequential   Transducers for Generative Recommendations

Jiaqi Zhai; Lucy Liao; Xing Liu; Yueming Wang; Rui Li; Xuan Cao; Leon; Gao; Zhaojie Gong; Fangda Gu; Michael He; Yinghai Lu; Yu Shi

arXiv:2402.17152·cs.LG·May 7, 2024·5 cites

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon, Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

PDF

Open Access 5 Repos

TL;DR

This paper introduces HSTU, a trillion-parameter generative recommendation model based on sequential transduction, which significantly outperforms existing models in accuracy and speed, and demonstrates scalable quality improvements with increased compute.

Contribution

The paper proposes a novel architecture, HSTU, for large-scale recommendation systems, reformulating them as sequential transduction tasks within a generative modeling framework.

Findings

01

HSTU outperforms baselines by up to 65.8% in NDCG.

02

HSTU is 5.3x to 15.2x faster than FlashAttention2-based Transformers.

03

Generative Recommenders scale as a power-law of training compute, up to GPT-3/LLaMa-2 scale.

Abstract

Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in language and vision domains, we revisit fundamental design choices in recommendation systems. We reformulate recommendation problems as sequential transduction tasks within a generative modeling framework ("Generative Recommenders"), and propose a new architecture, HSTU, designed for high cardinality, non-stationary streaming recommendation data. HSTU outperforms baselines over synthetic and public datasets by up to 65.8% in NDCG, and is 5.3x to 15.2x faster than FlashAttention2-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis