Lateralization MLP: A Simple Brain-inspired Architecture for Diffusion
Zizhao Hu, Mohammad Rostami

TL;DR
The paper introduces Lateralization MLP (L-MLP), a brain-inspired architecture that rivals transformers in diffusion tasks while being more efficient, by mimicking human brain lateralization in a simple, scalable MLP design.
Contribution
Proposes the Lateralization MLP (L-MLP), a novel brain-inspired architecture that outperforms other MLP variants and matches transformer performance in diffusion tasks.
Findings
L-MLP outperforms other MLP variants.
L-MLP performs comparably to transformers in diffusion tasks.
L-MLP is highly efficient and effective.
Abstract
The Transformer architecture has dominated machine learning in a wide range of tasks. The specific characteristic of this architecture is an expensive scaled dot-product attention mechanism that models the inter-token interactions, which is known to be the reason behind its success. However, such a mechanism does not have a direct parallel to the human brain which brings the question if the scaled-dot product is necessary for intelligence with strong expressive power. Inspired by the lateralization of the human brain, we propose a new simple but effective architecture called the Lateralization MLP (L-MLP). Stacking L-MLP blocks can generate complex architectures. Each L-MLP block is based on a multi-layer perceptron (MLP) that permutes data dimensions, processes each dimension in parallel, merges them, and finally passes through a joint MLP. We discover that this specific design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
