Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking
Ryo Karakida, Toshihiro Ota, Masato Taki

TL;DR
This paper introduces a novel framework combining hierarchical associative memory with MetaFormers to unify Transformer components into a Hopfield network, revealing the importance of symmetry-breaking in model performance.
Contribution
It presents a new theoretical perspective by integrating Krotov's associative memory with MetaFormers, leading to a parallelized MLP-Mixer model and insights into symmetry-breaking effects.
Findings
Symmetric matrices hinder image recognition performance.
Symmetry-breaking improves the effectiveness of MLP-Mixer.
Vanilla MLP-Mixer spontaneously develops symmetry-breaking configurations during training.
Abstract
Transformers have established themselves as the leading neural network model in natural language processing and are increasingly foundational in various domains. In vision, the MLP-Mixer model has demonstrated competitive performance, suggesting that attention mechanisms might not be indispensable. Inspired by this, recent research has explored replacing attention modules with other mechanisms, including those described by MetaFormers. However, the theoretical framework for these models remains underdeveloped. This paper proposes a novel perspective by integrating Krotov's hierarchical associative memory with MetaFormers, enabling a comprehensive representation of the entire Transformer block, encompassing token-/channel-mixing modules, layer normalization, and skip connections, as a single Hopfield network. This approach yields a parallelized MLP-Mixer derived from a three-layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Average Pooling · Global Average Pooling · Residual Connection · Softmax · MLP-Mixer · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam
