Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural   Networks

Ileana Rugina; Rumen Dangovski; Li Jing; Preslav Nakov; Marin; Solja\v{c}i\'c

arXiv:2012.02030·cs.CL·May 20, 2024

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

Ileana Rugina, Rumen Dangovski, Li Jing, Preslav Nakov, Marin, Solja\v{c}i\'c

PDF

Open Access 2 Repos

TL;DR

This paper introduces Attention Pruning (AP), a novel framework that identifies global attention sparseness in NLP models, significantly reducing computation and memory use while maintaining performance.

Contribution

The paper presents a new pruning framework that leverages fixed dataset attention patterns to create global sparsity masks, improving efficiency of attention mechanisms in NLP models.

Findings

01

AP reduces attention computation by 90% in language modeling.

02

AP decreases attention computation by about 50% in machine translation and GLUE tasks.

03

The method reveals key differences between self- and cross-attention patterns.

Abstract

Attention mechanisms play a crucial role in the neural revolution of Natural Language Processing (NLP). With the growth of attention-based models, several pruning techniques have been developed to identify and exploit sparseness, making these models more efficient. Most efforts focus on hard-coding attention patterns or pruning attention weights based on training data. We propose Attention Pruning (AP), a framework that observes attention patterns in a fixed dataset and generates a global sparseness mask. AP saves 90% of attention computation for language modeling and about 50% for machine translation and GLUE tasks, maintaining result quality. Our method reveals important distinctions between self- and cross-attention patterns, guiding future NLP research. Our framework can reduce both latency and memory requirements for any attention-based model, aiding in the development of improved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)

MethodsPruning · Linear Layer · WordPiece · Residual Connection · Dense Connections · Attention Is All You Need · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Weight Decay