Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions
Wentao Zhang, Yutong Zhang, Yifan Zhu, Wentao Mo

TL;DR
This paper introduces Bernstein Linear Unit (BerLU), a smooth, computationally efficient activation function based on Bernstein polynomials, improving stability and performance in deep neural networks.
Contribution
It presents a novel activation function using Bernstein polynomials that guarantees smoothness, stability, and efficiency, with theoretical and empirical validation.
Findings
BerLU ensures continuous differentiability and Lipschitz constant of one.
Empirical results show BerLU outperforms state-of-the-art activations on image classification.
BerLU offers superior computational and memory efficiency in deep architectures.
Abstract
The efficacy of deep neural networks is heavily reliant on the design of non-linear activation functions, yet existing approaches often struggle to balance optimization stability with computational efficiency. While piecewise linear functions offer inference speed, they suffer from optimization instability due to non-differentiability at the origin, whereas smooth counterparts typically incur significant computational overhead through their reliance on transcendental operations. To address these limitations, this paper proposes a general smoothing framework based on constructive approximation theory and introduces the Bernstein Linear Unit (BerLU). This novel activation function utilizes Bernstein polynomials to construct a differentiable quadratic transition region that effectively eliminates singularities while maintaining a piecewise linear structure. Theoretical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
