Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian, Li, Wei-Po Wang, Han Liu

TL;DR
This paper introduces OutEffHop, a novel outlier-efficient Hopfield model that improves large transformer training by enhancing associative memory retrieval and attention mechanisms, with theoretical and empirical advantages over existing methods.
Contribution
The paper proposes OutEffHop, a new associative memory model that enhances outlier efficiency in transformer models, providing theoretical improvements and practical benefits over prior attention mechanisms.
Findings
Achieves 22% reduction in kurtosis of model outputs.
Reduces maximum infinity norm by 26% across tested models.
Demonstrates superior performance on BERT, OPT, ViT, and STanHop-Net benchmarks.
Abstract
We introduce an Outlier-Efficient Modern Hopfield Model (termed ) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism (): it is an approximation of the memory retrieval process of . Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗magicslabnu/OutEffHop-opt-125mmodel· 17 dl17 dl
- 🤗magicslabnu/OutEffHop-opt-1.3bmodel· 7 dl7 dl
- 🤗magicslabnu/OutEffHop_bert_basemodel· 18 dl· ♡ 218 dl♡ 2
- 🤗magicslabnu/Clip_OutEffHop_OPT_125mmodel· 10 dl10 dl
- 🤗magicslabnu/Clip_OutEffHop_bert_basemodel· 2 dl2 dl
- 🤗magicslabnu/gate_OutEffHop_opt-125mmodel· 9 dl9 dl
- 🤗magicslabnu/gate_OutEffHop_bert_basemodel· 3 dl3 dl
- 🤗magicslabnu/clip_OutEffHop_vit_small_patch16_224model· 20 dl20 dl
- 🤗magicslabnu/gate_OutEffHop_vit_small_patch16_224_hfmodel· 11 dl11 dl
- 🤗magicslabnu/OutEffHop_vit_small_patch16_224model· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Properties and Applications · Model Reduction and Neural Networks
MethodsAttention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · WordPiece · Linear Layer · Dense Connections · OPT · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay
