LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Zheng Chai; Qin Ren; Xijun Xiao; Huizhi Yang; Bo Han; Sijun Zhang; Di Chen; Hui Lu; Wenlin Zhao; Lele Yu; Xionghang Xie; Shiru Ren; Xiang Sun; Yaocheng Tan; Peng Xu; Yuchao Zheng; Di Wu

arXiv:2505.04421·cs.IR·July 21, 2025

LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, Di Wu

PDF

Open Access

TL;DR

LONGER is a scalable transformer model designed for ultra-long user behavior sequences, improving efficiency and effectiveness in industrial recommender systems through innovative attention stabilization, token merging, and engineering optimizations.

Contribution

The paper introduces LONGER, a novel long-sequence transformer with global token mechanisms, hybrid attention, and engineering optimizations, enabling efficient industrial-scale recommender systems.

Findings

01

Outperforms strong baselines in offline metrics

02

Demonstrates significant improvements in online A/B tests

03

Successfully deployed in over 10 scenarios serving billions of users

Abstract

Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Information Retrieval and Search Behavior · Advanced Bandit Algorithms Research

MethodsSoftmax · Attention Is All You Need