Divine Benevolence is an $x^2$: GLUs scale asymptotically faster than MLPs
Alejandro Francisco Queiruga

TL;DR
This paper demonstrates that Gated Linear Units (GLUs) and related architectures have a quadratic approximation capability that enables faster asymptotic scaling compared to traditional MLPs, supported by theoretical analysis and empirical validation.
Contribution
The paper applies numerical analysis to reveal that GLUs exhibit quadratic approximation scaling, leading to faster model growth than MLPs, and introduces the Gated Quadratic Unit for even better scaling.
Findings
GLUs have an $x^2$ functional form enabling quadratic approximation.
The $L(P)$ scaling slope for GLUs is proportional to $P^{-3}$, faster than MLPs' $P^{-2}$.
Empirical verification confirms theoretical scaling laws in 1D function approximation.
Abstract
Scaling laws can be understood from ground-up numerical analysis, where traditional function approximation theory can explain shifts in model architecture choices. GLU variants now dominate frontier LLMs and similar outer-product architectures are prevalent in ranking models. The success of these architectures has mostly been left as an empirical discovery. In this paper, we apply the tools of numerical analysis to expose a key factor: these models have an which enables \emph{asymptotically} faster scaling than MLPs. GLUs have piecewise quadratic functional forms that are sufficient to exhibit quadratic order of approximation. Our key contribution is to demonstrate that the scaling slope is for GLUs but only for MLPs on function reconstruction problems. We provide a parameter construction and empirical verification of these slopes for 1D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computability, Logic, AI Algorithms · Logic, programming, and type systems
