Neural Attention Memory
Hyoungwook Nam, Seung Byum Seo

TL;DR
This paper introduces Neural Attention Memory (NAM), a differentiable memory architecture that enhances neural networks' capabilities in algorithmic tasks, few-shot learning, and long-range attention, outperforming existing methods.
Contribution
The paper presents NAM as a novel memory structure for neural networks, demonstrating its effectiveness in various tasks and improving upon existing attention mechanisms.
Findings
NAM-based models outperform baselines in zero-shot generalization.
NAM reduces false positives in few-shot learning.
NAM enables efficient long-range attention in Transformers.
Abstract
We propose a novel perspective of the attention mechanism by reinventing it as a memory architecture for neural networks, namely Neural Attention Memory (NAM). NAM is a memory structure that is both readable and writable via differentiable linear algebra operations. We explore three use cases of NAM: memory-augmented neural network (MANN), few-shot learning, and efficient long-range attention. First, we design two NAM-based MANNs of Long Short-term Memory (LSAM) and NAM Turing Machine (NAM-TM) that show better computational powers in algorithmic zero-shot generalization tasks compared to other baselines such as differentiable neural computer (DNC). Next, we apply NAM to the N-way K-shot learning task and show that it is more effective at reducing false positives compared to the baseline cosine classifier. Finally, we implement an efficient Transformer with NAM and evaluate it with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Neural Networks and Reservoir Computing
MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Dense Connections · Absolute Position Encodings · Adam · Multi-Head Attention · Position-Wise Feed-Forward Layer · Dropout · Byte Pair Encoding
