ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Jiaxin Deng; Shiyao Wang; Song Lu; Yinfeng Li; Xinchen Luo; Yuanjun; Liu; Peixing Xu; Guorui Zhou

arXiv:2408.09380·cs.AI·February 13, 2025

ELASTIC: Efficient Linear Attention for Sequential Interest Compression

Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun, Liu, Peixing Xu, Guorui Zhou

PDF

Open Access

TL;DR

ELASTIC introduces a linear attention mechanism for sequential recommendation models, drastically reducing computational costs and memory usage while maintaining high accuracy for modeling long user behavior sequences.

Contribution

ELASTIC proposes a novel linear attention mechanism with interest compression and a large interest memory bank, enabling scalable and efficient long sequence modeling in recommendation systems.

Findings

01

Reduces GPU memory usage by up to 90%.

02

Speeds up inference by 2.7 times.

03

Outperforms baseline models on public datasets.

Abstract

State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time complexity and decoupling model capacity from computational cost. Specifically, ELASTIC introduces a fixed length interest experts with linear dispatcher attention mechanism which compresses the long-term behaviour sequences to a significantly more compact representation which reduces up to 90% GPU memory usage with x2.7 inference speed up. The proposed linear dispatcher attention mechanism significantly reduces the quadratic complexity and makes the model feasible for adequately modeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Computability, Logic, AI Algorithms

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings