Gaussian Error Linear Units (GELUs)

Dan Hendrycks; Kevin Gimpel

arXiv:1606.08415·cs.LG·June 7, 2023·3.2k cites

Gaussian Error Linear Units (GELUs)

Dan Hendrycks, Kevin Gimpel

PDF

Open Access 5 Repos 1 Models

TL;DR

The paper introduces GELU, a new neural network activation function based on Gaussian error, which outperforms ReLU and ELU across various tasks in vision, language, and speech domains.

Contribution

It proposes the GELU activation function, combining Gaussian error properties with neural network nonlinearities, and demonstrates its superior performance empirically.

Findings

01

GELU outperforms ReLU and ELU in multiple tasks

02

GELU improves model accuracy across vision, NLP, and speech

03

Empirical results show consistent performance gains

Abstract

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x Φ (x)$ , where $Φ (x)$ the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs ( $x 1_{x > 0}$ ). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
AmberLJC/activation_functions
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Neural Networks and Applications · Advanced Neural Network Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Refunds@Expedia|||How do I get a full refund from Expedia?