Exponential Approximation Rates and Parameter Efficiency of Learnable Bernstein Activations
Ibrahim Albool, Malak Gamal El-Din, Salma Elmalaki, Yasser Shoukry

TL;DR
This paper introduces learnable Bernstein polynomial activations in deep neural networks, demonstrating exponential approximation error decay and significant parameter efficiency improvements over traditional activations.
Contribution
The paper provides a theoretical analysis of DeepBern-Nets with learnable Bernstein activations, showing exponential approximation rates and validating these with extensive experiments.
Findings
DBNs achieve over 70% parameter reduction compared to ReLU-based networks.
DBNs converge faster, reaching ReLU's loss in as few as 26% of training epochs.
DBNs attain up to 45% lower final loss than traditional activation functions.
Abstract
The choice of activation function fundamentally shapes the representational capacity and parameter efficiency of deep neural networks, yet most widely used activations lack rigorous theoretical guarantees on these properties. We provide a theoretical analysis of DeepBern-Nets (DBNs) -- networks employing learnable Bernstein polynomial activations -- showing that their approximation error decays with the network depth and the polynomial order with a rate of , exponentially faster than the polynomial rate of ReLU architectures while remaining fully differentiable. We validate these predictions through experiments on large scientific datasets (HIGGS and SUSY), comparing DBNs against ReLU, Leaky ReLU, SELU, and GeLU. DBNs achieve over parameter reduction across the majority of architectures -- reaching at scale -- converge to ReLU's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
