HyperMLP: An Integrated Perspective for Sequence Modeling
Jiecheng Lu, Shihao Yang

TL;DR
This paper presents HyperMLP and HyperGLU, novel models that unify attention and MLPs by viewing attention as a dynamic, context-dependent MLP, leading to improved sequence modeling performance.
Contribution
It introduces HyperMLP and HyperGLU, which incorporate dynamic feature and sequence mixing, offering a unified perspective that enhances sequence modeling beyond traditional softmax attention.
Findings
HyperMLP/HyperGLU outperform softmax-attention baselines with similar parameters.
The models demonstrate improved expressivity and sequence modeling capabilities.
Theoretical analysis supports the effectiveness of dynamic mixing in sequence tasks.
Abstract
Self-attention is often viewed as probabilistic query-key lookup, motivating designs that preserve normalized attention scores and fixed positional semantics. We advocate a simpler and more unified perspective: an autoregressive attention head can be viewed as a dynamic two-layer MLP whose weights are instantiated from the context history. From this view, attention scores form an ever-growing hidden representation, and standard MLP activations such as ReLU or GLU naturally implement input-conditioned selection over a context-dependent memory pool rather than a probability distribution. Based on this formulation, we introduce HyperMLP and HyperGLU, which learn dynamic mixing in both feature space and sequence space, using a reverse-offset (lag) layout to align temporal mixing with autoregressive semantics. We provide theoretical characterizations of the expressivity and implications of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications
