GMAT: Global Memory Augmentation for Transformers
Ankit Gupta, Jonathan Berant

TL;DR
This paper introduces GMAT, a method that augments sparse Transformer models with a global memory component to efficiently capture long-range dependencies, improving performance on various NLP tasks.
Contribution
It proposes a novel global memory augmentation for sparse Transformers that enhances long-range context modeling with manageable memory overhead.
Findings
Significant improvement on global reasoning tasks
Enhanced masked language modeling performance
Better reading comprehension results
Abstract
Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the attention that has a large memory requirement for length sequences, limiting its ability to process long documents. This has been the subject of substantial interest recently, where multiple approximations were proposed to reduce the quadratic memory requirement using sparse attention matrices. In this work, we propose to augment sparse Transformer blocks with a dense attention-based of length () which provides an aggregate global view of the entire input sequence to each position. Our augmentation has a manageable memory overhead, and can be seamlessly integrated with prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Topic Modeling · Ferroelectric and Negative Capacitance Devices
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Residual Connection · Label Smoothing · Multi-Head Attention · Weight Decay · Adam
