Neural Networks with A La Carte Selection of Activation Functions
Moshe Sipper

TL;DR
This paper explores combining multiple known activation functions into neural network architectures using random generation, hyper-parameter optimization, and evolutionary strategies, leading to improved performance over standard ReLU-based networks.
Contribution
It introduces three novel methods for selecting and combining activation functions, demonstrating their effectiveness across multiple classification tasks.
Findings
All methods outperform standard ReLU networks on 25 classification problems.
Optuna with TPE sampler yields the best activation function architectures.
Combining known AFs can significantly enhance neural network performance.
Abstract
Activation functions (AFs), which are pivotal to the success (or failure) of a neural network, have received increased attention in recent years, with researchers seeking to design novel AFs that improve some aspect of network performance. In this paper we take another direction, wherein we combine a slew of known AFs into successful architectures, proposing three methods to do so beneficially: 1) generate AF architectures at random, 2) use Optuna, an automatic hyper-parameter optimization software framework, with a Tree-structured Parzen Estimator (TPE) sampler, and 3) use Optuna with a Covariance Matrix Adaptation Evolution Strategy (CMA-ES) sampler. We show that all methods often produce significantly better results for 25 classification problems when compared with a standard network composed of ReLU hidden units and a softmax output unit. Optuna with the TPE sampler emerged as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
