LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling

Tianhe Lin; Ziwei Xiong; Baoyuan Ou; Yingjie Qin; Lai Xu; Xiaocheng Zhong; Yao Hu; Zhiyong Wang; Tao Zhou; Yubin Xu; Di Wu

arXiv:2602.11562·cs.IR·February 13, 2026

LASER: An Efficient Target-Aware Segmented Attention Framework for End-to-End Long Sequence Modeling

Tianhe Lin, Ziwei Xiong, Baoyuan Ou, Yingjie Qin, Lai Xu, Xiaocheng Zhong, Yao Hu, Zhiyong Wang, Tao Zhou, Yubin Xu, Di Wu

PDF

Open Access

TL;DR

LASER introduces a target-aware segmented attention framework with system and algorithmic innovations, significantly improving efficiency and effectiveness in modeling ultra-long user behavior sequences for recommendation systems.

Contribution

The paper presents LASER, a novel framework combining a schema-aware infrastructure and a segmented attention mechanism to efficiently model long sequences in real-time environments.

Findings

01

50% reduction in retrieval latency

02

75% decrease in CPU usage

03

Over 2% lift in revenue in online tests

Abstract

Modeling ultra-long user behavior sequences is pivotal for capturing evolving and lifelong interests in modern recommendation systems. However, deploying such models in real-time industrial environments faces a strict "Latency Wall", constrained by two distinct bottlenecks: the high I/O latency of retrieving massive user histories and the quadratic computational complexity of standard attention mechanisms. To break these bottlenecks, we present LASER, a full-stack optimization framework developed and deployed at Xiaohongshu (RedNote). Our approach tackles the challenges through two complementary innovations: (1) System efficiency: We introduce SeqVault, a unified schema-aware serving infrastructure for long user histories. By implementing a hybrid DRAM-SSD indexing strategy, SeqVault reduces retrieval latency by 50% and CPU usage by 75%, ensuring millisecond-level access to full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Algorithms and Data Compression