Divine Benevolence is an $x^2$: GLUs scale asymptotically faster than MLPs

Alejandro Francisco Queiruga

arXiv:2602.14495·cs.LG·February 25, 2026

Divine Benevolence is an $x^2$: GLUs scale asymptotically faster than MLPs

Alejandro Francisco Queiruga

PDF

Open Access

TL;DR

This paper demonstrates that Gated Linear Units (GLUs) and related architectures have a quadratic approximation capability that enables faster asymptotic scaling compared to traditional MLPs, supported by theoretical analysis and empirical validation.

Contribution

The paper applies numerical analysis to reveal that GLUs exhibit quadratic approximation scaling, leading to faster model growth than MLPs, and introduces the Gated Quadratic Unit for even better scaling.

Findings

01

GLUs have an $x^2$ functional form enabling quadratic approximation.

02

The $L(P)$ scaling slope for GLUs is proportional to $P^{-3}$, faster than MLPs' $P^{-2}$.

03

Empirical verification confirms theoretical scaling laws in 1D function approximation.

Abstract

Scaling laws can be understood from ground-up numerical analysis, where traditional function approximation theory can explain shifts in model architecture choices. GLU variants now dominate frontier LLMs and similar outer-product architectures are prevalent in ranking models. The success of these architectures has mostly been left as an empirical discovery. In this paper, we apply the tools of numerical analysis to expose a key factor: these models have an $x^{2}$ which enables \emph{asymptotically} faster scaling than MLPs. GLUs have piecewise quadratic functional forms that are sufficient to exhibit quadratic order of approximation. Our key contribution is to demonstrate that the $L (P)$ scaling slope is $L (P) \propto P^{- 3}$ for GLUs but only $L (P) = P^{- 2}$ for MLPs on function reconstruction problems. We provide a parameter construction and empirical verification of these slopes for 1D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computability, Logic, AI Algorithms · Logic, programming, and type systems