Geometric Monomial (GEM): a family of rational 2N-differentiable activation functions
Eylon E. Krause

TL;DR
This paper introduces a family of smooth, rational activation functions called GEM, which outperform traditional activations like GELU across various neural network architectures and tasks.
Contribution
The authors propose a novel family of $C^{2N}$-smooth rational activation functions with variants that improve deep neural network training and performance.
Findings
GEM with $N=1$ reduces the GELU deficit on CIFAR-100 + ResNet-56.
SE-GEM surpasses GELU on CIFAR-10 + ResNet-56.
E-GEM reduces the GELU deficit on CIFAR-100 + ResNet-56 to 0.62%.
Abstract
The choice of activation function plays a crucial role in the optimization and performance of deep neural networks. While the Rectified Linear Unit (ReLU) remains the dominant choice due to its simplicity and effectiveness, its lack of smoothness may hinder gradient-based optimization in deep architectures. In this work we propose a family of -smooth activation functions whose gate follows a log-logistic CDF, achieving ReLU-like performance with purely rational arithmetic. We introduce three variants: GEM (the base family), E-GEM (an -parameterized generalization enabling arbitrary -approximation of ReLU), and SE-GEM (a piecewise variant eliminating dead neurons with junction smoothness). An -ablation study establishes as optimal for standard-depth networks, reducing the GELU deficit on CIFAR-100 + ResNet-56 from 6.10% to 2.12%. The smoothness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
