Unlocking the Secrets of Linear Complexity Sequence Model from A Unified   Perspective

Zhen Qin; Xuyang Shen; Dong Li; Weigao Sun; Stan Birchfield; Richard; Hartley; Yiran Zhong

arXiv:2405.17383·cs.CL·May 28, 2024

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Zhen Qin, Xuyang Shen, Dong Li, Weigao Sun, Stan Birchfield, Richard, Hartley, Yiran Zhong

PDF

Open Access

TL;DR

This paper introduces the Linear Complexity Sequence Model (LCSM), unifying various sequence modeling techniques with a three-stage process to improve understanding and performance in language modeling and retrieval tasks.

Contribution

The paper proposes a unified framework for sequence models based on linear complexity, analyzing the impact of each stage and setting to enhance comprehension and performance.

Findings

01

Data-driven methods improve language modeling.

02

Hand-crafted methods enhance retrieval tasks.

03

Comprehensive analysis of stage settings impacts.

Abstract

We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint. Specifically, we segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink (EOS), with each model having its own specific settings. The Expand stage involves projecting the input signal onto a high-dimensional memory state. This is followed by recursive operations performed on the memory state in the Oscillation stage. Finally, the memory state is projected back to a low-dimensional space in the Shrink stage. We perform comprehensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining and Machine Learning Applications