Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions

Wentao Zhang; Yutong Zhang; Yifan Zhu; Wentao Mo

arXiv:2605.02591·cs.AI·May 5, 2026

Universal Smoothness via Bernstein Polynomials: A Constructive Approximation Approach for Activation Functions

Wentao Zhang, Yutong Zhang, Yifan Zhu, Wentao Mo

PDF

TL;DR

This paper introduces Bernstein Linear Unit (BerLU), a smooth, computationally efficient activation function based on Bernstein polynomials, improving stability and performance in deep neural networks.

Contribution

It presents a novel activation function using Bernstein polynomials that guarantees smoothness, stability, and efficiency, with theoretical and empirical validation.

Findings

01

BerLU ensures continuous differentiability and Lipschitz constant of one.

02

Empirical results show BerLU outperforms state-of-the-art activations on image classification.

03

BerLU offers superior computational and memory efficiency in deep architectures.

Abstract

The efficacy of deep neural networks is heavily reliant on the design of non-linear activation functions, yet existing approaches often struggle to balance optimization stability with computational efficiency. While piecewise linear functions offer inference speed, they suffer from optimization instability due to non-differentiability at the origin, whereas smooth counterparts typically incur significant computational overhead through their reliance on transcendental operations. To address these limitations, this paper proposes a general smoothing framework based on constructive approximation theory and introduces the Bernstein Linear Unit (BerLU). This novel activation function utilizes Bernstein polynomials to construct a differentiable quadratic transition region that effectively eliminates singularities while maintaining a piecewise linear structure. Theoretical analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.