Deriving Activation Functions Using Integration
Allen Hao Huang, Imanol Schlag

TL;DR
This paper introduces xIELU, a novel trainable activation function derived through integration, which adapts its gradient properties to improve language model performance, achieving lower perplexity than existing functions.
Contribution
The paper presents xIELU, a new trainable activation function derived via integration, combining properties of ReLU$^2$ and xSiLU, and demonstrates its effectiveness in large language models.
Findings
xIELU outperforms ReLU$^2$ and SwiGLU in perplexity on large language models.
xIELU's trainable parameters enable adaptive nonlinearity reduction in deep networks.
Experimental results on 1.1B and 3B parameter models show improved performance.
Abstract
Our work proposes a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding activation functions using integration. We introduce the Expanded Integral of the Exponential Linear Unit (xIELU), a trainable piecewise activation function derived by integrating trainable affine transformations applied to the Exponential Linear Unit (ELU). xIELU combines two key properties for the gradient: (1) a trainable and linearly increasing gradient for positive inputs, similar to Squared ReLU (ReLU), and (2) a trainable gradient that can take negative values for negative inputs, inspired by Expanded SiLU (xSiLU). Conceptually, xIELU can be viewed as an extension of ReLU to handle negative inputs. The trainable parameters in xIELU allow it to adaptively reduce its nonlinearity for higher-level representations deeper in the network. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Matrix Theory and Algorithms
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sigmoid Linear Unit · Squared ReLU · LLaMA · Exponential Linear Unit · SwiGLU
