Sparse Modular Activation for Efficient Sequence Modeling

Liliang Ren; Yang Liu; Shuohang Wang; Yichong Xu; Chenguang Zhu,; ChengXiang Zhai

arXiv:2306.11197·cs.LG·November 7, 2023·2 cites

Sparse Modular Activation for Efficient Sequence Modeling

Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu,, ChengXiang Zhai

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Sparse Modular Activation (SMA), a dynamic sparsity mechanism that improves the efficiency of sequence models by selectively activating sub-modules, enabling linear complexity and state-of-the-art results across various tasks.

Contribution

The paper proposes SMA, a novel differentiable mechanism for dynamic sparse activation of sub-modules, and designs SeqBoat, a new architecture leveraging SMA for efficient sequence modeling.

Findings

01

SeqBoat achieves linear inference complexity with state-of-the-art performance.

02

SMA reduces computation and memory usage during training and inference.

03

Learned sparse activation patterns reveal task-specific attention requirements.

Abstract

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

renll/seqboat
pytorchOfficial

Videos

Sparse Modular Activation for Efficient Sequence Modeling· slideslive

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Domain Adaptation and Few-Shot Learning

MethodsSlime Mould Algorithm