Rational Neural Networks have Expressivity Advantages

Maosen Tang; Alex Townsend

arXiv:2602.12390·cs.LG·February 16, 2026

Rational Neural Networks have Expressivity Advantages

Maosen Tang, Alex Townsend

PDF

Open Access

TL;DR

This paper demonstrates that neural networks with trainable rational activation functions are more expressive and parameter-efficient than traditional activations, with theoretical and practical advantages shown across various architectures.

Contribution

It introduces trainable rational activation functions, proving their superior expressivity and efficiency both theoretically and empirically compared to standard activations.

Findings

01

Rational-activation networks approximate fixed-activation networks with exponentially fewer parameters.

02

Rational activations outperform fixed activations in practical training scenarios.

03

Theoretical separation results show exponential advantages in approximation complexity.

Abstract

We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $ε > 0$ , we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $poly (lo g lo g (1/ ε))$ overhead in size, while the converse provably requires $Ω (lo g (1/ ε))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Stochastic Gradient Optimization Techniques · Topic Modeling