Optimal Nonlinearities Improve Generalization Performance of Random Features
Samet Demir, Zafer Do\u{g}an

TL;DR
This paper introduces optimal nonlinear activation functions for random feature models, demonstrating improved generalization and mitigation of double descent phenomena across various tasks including CIFAR10.
Contribution
It identifies a set of optimal nonlinearities derived from the Gaussian model parameters, enhancing generalization beyond standard functions like ReLU.
Findings
Optimized nonlinearities outperform ReLU in generalization tasks.
The proposed functions mitigate the double descent phenomenon.
Experimental validation on synthetic and real data supports the theoretical claims.
Abstract
Random feature model with a nonlinear activation function has been shown to perform asymptotically equivalent to a Gaussian model in terms of training and generalization errors. Analysis of the equivalent model reveals an important yet not fully understood role played by the activation function. To address this issue, we study the "parameters" of the equivalent model to achieve improved generalization performance for a given supervised learning problem. We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities. We provide two example classes from this set, e.g., second-order polynomial and piecewise linear functions. These functions are optimized to improve generalization performance regardless of the actual form. We experiment with regression and classification problems, including synthetic and real (e.g., CIFAR10) data. Our numerical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Gaussian Processes and Bayesian Inference
