LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling
Tianhe Lin, Ziwei Xiong, Baoyuan Ou, Yingjie Qin, Lai Xu, Xiaocheng Zhong, Yao Hu, Zhiyong Wang, Tao Zhou, Yubin Xu, Di Wu

TL;DR
LASER introduces a target-aware segmented attention framework with system and algorithmic innovations, significantly improving efficiency and effectiveness in modeling ultra-long user behavior sequences for recommendation systems.
Contribution
The paper presents LASER, a novel framework combining a schema-aware infrastructure and a segmented attention mechanism to efficiently model long sequences in real-time environments.
Findings
50% reduction in retrieval latency
75% decrease in CPU usage
Over 2% lift in revenue in online tests
Abstract
Modeling ultra-long user behavior sequences is pivotal for capturing evolving and lifelong interests in modern recommendation systems. However, deploying such models in real-time industrial environments faces a strict "Latency Wall", constrained by two distinct bottlenecks: the high I/O latency of retrieving massive user histories and the quadratic computational complexity of standard attention mechanisms. To break these bottlenecks, we present LASER, a full-stack optimization framework developed and deployed at Xiaohongshu (RedNote). Our approach tackles the challenges through two complementary innovations: (1) System efficiency: We introduce SeqVault, a unified schema-aware serving infrastructure for long user histories. By implementing a hybrid DRAM-SSD indexing strategy, SeqVault reduces retrieval latency by 50% and CPU usage by 75%, ensuring millisecond-level access to full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Algorithms and Data Compression
