Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

Lin Guan; Jia-Qi Yang; Zhishan Zhao; Beichuan Zhang; Bo Sun; Xuanyuan Luo; Jinan Ni; Xiaowen Li; Yuhang Qi; Zhifang Fan; Hangyu Wang; Qiwei Chen; Yi Cheng; Feng Zhang; Xiao Yang

arXiv:2511.06077·cs.LG·May 20, 2026

Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

Lin Guan, Jia-Qi Yang, Zhishan Zhao, Beichuan Zhang, Bo Sun, Xuanyuan Luo, Jinan Ni, Xiaowen Li, Yuhang Qi, Zhifang Fan, Hangyu Wang, Qiwei Chen, Yi Cheng, Feng Zhang, Xiao Yang

PDF

TL;DR

This paper introduces a scalable, end-to-end recommendation system for Douyin that effectively models 10,000-length user behavior sequences using novel attention and batching techniques, achieving significant online performance improvements.

Contribution

It presents the Stacked Target-to-History Cross Attention (STCA), Request Level Batching (RLB), and a length-extrapolative training strategy for efficient ultra-long sequence modeling in production.

Findings

01

Scaling to 10K-length histories improves engagement metrics.

02

The proposed methods reduce computational complexity and storage.

03

The system maintains low latency while handling ultra-long sequences.

Abstract

Short-video recommenders such as Douyin must exploit extremely long user behavior histories without breaking latency or cost budgets. We present an end-to-end industrial recommender system that scales long-sequence recommendation modeling to 10K-length histories in production. First, we introduce Stacked Target-to-History Cross Attention (STCA), which replaces history self-attention with stacked cross-attention from the target to the history, reducing complexity from quadratic to linear in sequence length and enabling efficient end-to-end training over long user behavior sequences. Second, we propose Request Level Batching (RLB), a user-centric batching scheme that aggregates multiple targets for the same user/request to share the user-side encoding, substantially lowering sequence-related storage, communication, and compute without changing the learning objective. Third, we design a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.