GELU Activation Function in Deep Learning: A Comprehensive Mathematical Analysis and Performance
Minhyeok Lee

TL;DR
This paper provides a detailed mathematical analysis of the GELU activation function and empirically demonstrates its superior performance over other functions across multiple datasets in deep learning models.
Contribution
It offers a rigorous mathematical exploration of GELU's properties and extensive experimental comparison, establishing its effectiveness in deep learning applications.
Findings
GELU outperforms traditional activation functions like ReLU in various tasks.
Mathematical properties of GELU, such as differentiability and smoothness, are thoroughly characterized.
Empirical results show GELU's superior performance on CIFAR-10, CIFAR-100, and STL-10 datasets.
Abstract
Selecting the most suitable activation function is a critical factor in the effectiveness of deep learning models, as it influences their learning capacity, stability, and computational efficiency. In recent years, the Gaussian Error Linear Unit (GELU) activation function has emerged as a dominant method, surpassing traditional functions such as the Rectified Linear Unit (ReLU) in various applications. This study presents a rigorous mathematical investigation of the GELU activation function, exploring its differentiability, boundedness, stationarity, and smoothness properties in detail. Additionally, we conduct an extensive experimental comparison of the GELU function against a broad range of alternative activation functions, utilizing a residual convolutional network trained on the CIFAR-10, CIFAR-100, and STL-10 datasets as the empirical testbed. Our results demonstrate the superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Fault Detection and Control Systems · Industrial Vision Systems and Defect Detection
MethodsALIGN
