Activation Functions: Dive into an optimal activation function
Vipul Bansal

TL;DR
This paper investigates optimizing activation functions in neural networks by combining existing functions and tuning their weights, revealing layer-dependent preferences for ReLU-like or convergent functions across image datasets.
Contribution
It introduces a method to optimize activation functions as weighted sums of existing ones and analyzes their layer-wise preferences in neural networks.
Findings
ReLU often dominates in the optimized combination.
Initial layers favor ReLU or LeakyReLU, deeper layers prefer convergent functions.
Optimized activation functions improve network performance on image datasets.
Abstract
Activation functions have come up as one of the essential components of neural networks. The choice of adequate activation function can impact the accuracy of these methods. In this study, we experiment for finding an optimal activation function by defining it as a weighted sum of existing activation functions and then further optimizing these weights while training the network. The study uses three activation functions, ReLU, tanh, and sin, over three popular image datasets, MNIST, FashionMNIST, and KMNIST. We observe that the ReLU activation function can easily overlook other activation functions. Also, we see that initial layers prefer to have ReLU or LeakyReLU type of activation functions, but deeper layers tend to prefer more convergent activation functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Advanced Neural Network Applications
