Rational Neural Networks have Expressivity Advantages
Maosen Tang, Alex Townsend

TL;DR
This paper demonstrates that neural networks with trainable rational activation functions are more expressive and parameter-efficient than traditional activations, with theoretical and practical advantages shown across various architectures.
Contribution
It introduces trainable rational activation functions, proving their superior expressivity and efficiency both theoretically and empirically compared to standard activations.
Findings
Rational-activation networks approximate fixed-activation networks with exponentially fewer parameters.
Rational activations outperform fixed activations in practical training scenarios.
Theoretical separation results show exponential advantages in approximation complexity.
Abstract
We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of , we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only overhead in size, while the converse provably requires parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Stochastic Gradient Optimization Techniques · Topic Modeling
