FuXi-\beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

Yufei Ye; Wei Guo; Hao Wang; Hong Zhu; Yuyang Ye; Yong Liu; Huifeng Guo; Ruiming Tang; Defu Lian; Enhong Chen

arXiv:2508.10615·cs.IR·August 15, 2025

FuXi-\beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model

Yufei Ye, Wei Guo, Hao Wang, Hong Zhu, Yuyang Ye, Yong Liu, Huifeng Guo, Ruiming Tang, Defu Lian, Enhong Chen

PDF

TL;DR

FuXi- introduces a lightweight, fast large-scale generative recommendation model that improves efficiency and performance by removing redundant attention components and employing novel attention mechanisms.

Contribution

The paper proposes a new framework for Transformer-like recommendation models, including the FuXi- model, with innovative attention modules that enhance speed and accuracy.

Findings

01

FuXi- outperforms previous models on multiple datasets.

02

Achieves 27-47% improvement in NDCG@10 on large-scale datasets.

03

Significantly accelerates training and inference while maintaining scalability.

Abstract

Scaling laws for autoregressive generative recommenders reveal potential for larger, more versatile systems but mean greater latency and training costs. To accelerate training and inference, we investigated the recent generative recommendation models HSTU and FuXi- $α$ , identifying two efficiency bottlenecks: the indexing operations in relative temporal attention bias and the computation of the query-key attention map. Additionally, we observed that relative attention bias in self-attention mechanisms can also serve as attention maps. Previous works like Synthesizer have shown that alternative forms of attention maps can achieve similar performance, naturally raising the question of whether some attention maps are redundant. Through empirical experiments, we discovered that using the query-key attention map might degrade the model's performance in recommendation tasks. To address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.