TL;DR
This paper introduces an evolutionary optimization approach to discover new activation functions for image classification, resulting in the EELU functions that outperform existing options across various neural networks and datasets.
Contribution
The study presents a novel evolutionary framework for optimizing activation functions, leading to the development of the EELU functions that surpass current state-of-the-art functions in image classification.
Findings
EELU functions outperform standard activation functions in 92.8% of tested cases.
The optimization scheme successfully discovers activation functions better suited for diverse neural network architectures.
The best activation function identified is $-x ext{erf}(e^{-x})$, demonstrating the scheme's effectiveness.
Abstract
Activation function has a significant impact on the dynamics, convergence, and performance of deep neural networks. The search for a consistent and high-performing activation function has always been a pursuit during deep learning model development. Existing state-of-the-art activation functions are manually designed with human expertise except for Swish. Swish was developed using a reinforcement learning-based search strategy. In this study, we propose an evolutionary approach for optimizing activation functions specifically for image classification tasks, aiming to discover functions that outperform current state-of-the-art options. Through this optimization framework, we obtain a series of high-performing activation functions denoted as Exponential Error Linear Unit (EELU). The developed activation functions are evaluated for image classification tasks from two perspectives: (1) five…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Sigmoid Activation · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Linear Layer
