A Method on Searching Better Activation Functions
Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan,, Yongzhe Chang, Xueqian Wang

TL;DR
This paper introduces a theoretically grounded method for designing better activation functions in neural networks, leading to the creation of CRReLU, which outperforms existing functions in vision and language tasks.
Contribution
The paper proposes a novel entropy-based framework for systematically designing and optimizing activation functions, including a new function CRReLU derived from ReLU.
Findings
CRReLU outperforms existing ReLU variants on CIFAR and ImageNet datasets.
CRReLU shows superior performance in LLM fine-tuning compared to GELU.
Theoretical demonstration of the worst activation function (WAFBC) from an information entropy perspective.
Abstract
The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsLinear Layer · Softmax · Attention Is All You Need · Multi-Head Attention · Layer Normalization · Dense Connections · Residual Connection · Vision Transformer
