Data-aware customization of activation functions reduces neural network error
Fuchang Gao, Boyu Zhang

TL;DR
Customizing activation functions based on data characteristics can significantly reduce neural network error, with the paper proposing criteria, a new 'seagull' activation function, and demonstrating substantial improvements across various tasks.
Contribution
The paper introduces a data-aware approach to customizing activation functions, including a new 'seagull' function, and provides theoretical criteria and empirical evidence of error reduction.
Findings
Order-of-magnitude error reduction with 'seagull' activation.
Best results when applied to exchangeability-connected layers.
Effective in both low- and high-dimensional datasets.
Abstract
Activation functions play critical roles in neural networks, yet current off-the-shelf neural networks pay little attention to the specific choice of activation functions used. Here we show that data-aware customization of activation functions can result in striking reductions in neural network error. We first give a simple linear algebraic explanation of the role of activation functions in neural networks; then, through connection with the Diaconis-Shahshahani Approximation Theorem, we propose a set of criteria for good activation functions. As a case study, we consider regression tasks with a partially exchangeable target function, \emph{i.e.} for and , and prove that for such a target function, using an even activation function in at least one of the layers guarantees that the prediction preserves partial exchangeability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
