Learning Activation Functions to Improve Deep Neural Networks
Forest Agostinelli, Matthew Hoffman, Peter Sadowski, Pierre Baldi

TL;DR
This paper introduces a learnable piecewise linear activation function for neural networks, optimized via gradient descent, leading to improved performance on image classification and physics benchmarks.
Contribution
It proposes a novel adaptive activation function that is learned independently for each neuron, enhancing deep neural network performance.
Findings
Achieved state-of-the-art results on CIFAR-10 and CIFAR-100 datasets.
Improved performance on a high-energy physics Higgs boson decay benchmark.
Demonstrated the effectiveness of learned activation functions over static ones.
Abstract
Artificial neural networks typically have a fixed, non-linear activation function at each neuron. We have designed a novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent. With this adaptive activation function, we are able to improve upon deep neural network architectures composed of static rectified linear units, achieving state-of-the-art performance on CIFAR-10 (7.51%), CIFAR-100 (30.83%), and a benchmark from high-energy physics involving Higgs boson decay modes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Seismic Imaging and Inversion Techniques · Medical Imaging Techniques and Applications
