Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks
Ileana Rugina, Rumen Dangovski, Li Jing, Preslav Nakov, Marin, Solja\v{c}i\'c

TL;DR
This paper introduces Attention Pruning (AP), a novel framework that identifies global attention sparseness in NLP models, significantly reducing computation and memory use while maintaining performance.
Contribution
The paper presents a new pruning framework that leverages fixed dataset attention patterns to create global sparsity masks, improving efficiency of attention mechanisms in NLP models.
Findings
AP reduces attention computation by 90% in language modeling.
AP decreases attention computation by about 50% in machine translation and GLUE tasks.
The method reveals key differences between self- and cross-attention patterns.
Abstract
Attention mechanisms play a crucial role in the neural revolution of Natural Language Processing (NLP). With the growth of attention-based models, several pruning techniques have been developed to identify and exploit sparseness, making these models more efficient. Most efforts focus on hard-coding attention patterns or pruning attention weights based on training data. We propose Attention Pruning (AP), a framework that observes attention patterns in a fixed dataset and generates a global sparseness mask. AP saves 90% of attention computation for language modeling and about 50% for machine translation and GLUE tasks, maintaining result quality. Our method reveals important distinctions between self- and cross-attention patterns, guiding future NLP research. Our framework can reduce both latency and memory requirements for any attention-based model, aiding in the development of improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsPruning · Linear Layer · WordPiece · Residual Connection · Dense Connections · Attention Is All You Need · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Weight Decay
