Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu; Pei-Hsuan Chang; Robin Luo; Hong-Yu Chen; Weijian; Li; Wei-Po Wang; Han Liu

arXiv:2404.03828·cs.LG·June 28, 2024·2 cites

Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian, Li, Wei-Po Wang, Han Liu

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces OutEffHop, a novel outlier-efficient Hopfield model that improves large transformer training by enhancing associative memory retrieval and attention mechanisms, with theoretical and empirical advantages over existing methods.

Contribution

The paper proposes OutEffHop, a new associative memory model that enhances outlier efficiency in transformer models, providing theoretical improvements and practical benefits over prior attention mechanisms.

Findings

01

Achieves 22% reduction in kurtosis of model outputs.

02

Reduces maximum infinity norm by 26% across tested models.

03

Demonstrates superior performance on BERT, OPT, ViT, and STanHop-Net benchmarks.

Abstract

We introduce an Outlier-Efficient Modern Hopfield Model (termed $OutEffHop$ ) and use it to address the outlier inefficiency problem of {training} gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlier-efficient attention mechanism ( $Softmax_{1}$ ): it is an approximation of the memory retrieval process of $OutEffHop$ . Methodologically, this allows us to introduce novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, with superior post-quantization performance. Theoretically, the Outlier-Efficient Modern Hopfield Model retains and improves the desirable properties of standard modern Hopfield models, including fixed point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

magics-lab/outeffhop
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMagnetic Properties and Applications · Model Reduction and Neural Networks

MethodsAttention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · WordPiece · Linear Layer · Dense Connections · OPT · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay