A Method on Searching Better Activation Functions

Haoyuan Sun; Zihao Wu; Bo Xia; Pu Chang; Zibin Dong; Yifu Yuan,; Yongzhe Chang; Xueqian Wang

arXiv:2405.12954·cs.LG·May 24, 2024

A Method on Searching Better Activation Functions

Haoyuan Sun, Zihao Wu, Bo Xia, Pu Chang, Zibin Dong, Yifu Yuan,, Yongzhe Chang, Xueqian Wang

PDF

Open Access

TL;DR

This paper introduces a theoretically grounded method for designing better activation functions in neural networks, leading to the creation of CRReLU, which outperforms existing functions in vision and language tasks.

Contribution

The paper proposes a novel entropy-based framework for systematically designing and optimizing activation functions, including a new function CRReLU derived from ReLU.

Findings

01

CRReLU outperforms existing ReLU variants on CIFAR and ImageNet datasets.

02

CRReLU shows superior performance in LLM fine-tuning compared to GELU.

03

Theoretical demonstration of the worst activation function (WAFBC) from an information entropy perspective.

Abstract

The success of artificial neural networks (ANNs) hinges greatly on the judicious selection of an activation function, introducing non-linearity into network and enabling them to model sophisticated relationships in data. However, the search of activation functions has largely relied on empirical knowledge in the past, lacking theoretical guidance, which has hindered the identification of more effective activation functions. In this work, we offer a proper solution to such issue. Firstly, we theoretically demonstrate the existence of the worst activation function with boundary conditions (WAFBC) from the perspective of information entropy. Furthermore, inspired by the Taylor expansion form of information entropy functional, we propose the Entropy-based Activation Function Optimization (EAFO) methodology. EAFO methodology presents a novel perspective for designing static activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsLinear Layer · Softmax · Attention Is All You Need · Multi-Head Attention · Layer Normalization · Dense Connections · Residual Connection · Vision Transformer