ELASTIC: Efficient Linear Attention for Sequential Interest Compression
Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun, Liu, Peixing Xu, Guorui Zhou

TL;DR
ELASTIC introduces a linear attention mechanism for sequential recommendation models, drastically reducing computational costs and memory usage while maintaining high accuracy for modeling long user behavior sequences.
Contribution
ELASTIC proposes a novel linear attention mechanism with interest compression and a large interest memory bank, enabling scalable and efficient long sequence modeling in recommendation systems.
Findings
Reduces GPU memory usage by up to 90%.
Speeds up inference by 2.7 times.
Outperforms baseline models on public datasets.
Abstract
State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Computability, Logic, AI Algorithms
MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
