Gaussian Error Linear Units (GELUs)
Dan Hendrycks, Kevin Gimpel

TL;DR
The paper introduces GELU, a new neural network activation function based on Gaussian error, which outperforms ReLU and ELU across various tasks in vision, language, and speech domains.
Contribution
It proposes the GELU activation function, combining Gaussian error properties with neural network nonlinearities, and demonstrates its superior performance empirically.
Findings
GELU outperforms ReLU and ELU in multiple tasks
GELU improves model accuracy across vision, NLP, and speech
Empirical results show consistent performance gains
Abstract
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is , where the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Neural Networks and Applications · Advanced Neural Network Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia?
