Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu; Zilin Wang; Lei Wu; Shaobo Zhang

arXiv:2602.19691·stat.ML·March 3, 2026

Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations

Yuhao Liu, Zilin Wang, Lei Wu, Shaobo Zhang

PDF

Open Access

TL;DR

This paper demonstrates that neural networks with smooth activation functions can adapt to the smoothness of target functions, achieving optimal approximation and estimation rates with constant depth by increasing width, unlike ReLU networks which require depth growth.

Contribution

It establishes the fundamental role of activation smoothness in enabling optimal rates in neural networks, providing a multi-scale approximation framework for analysis.

Findings

01

Smooth activations enable width-only adaptivity to smoothness.

02

ReLU networks require depth growth for higher smoothness.

03

Explicit neural network approximators with controlled complexity are constructed.

Abstract

Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural networks with smooth activations for learning functions in the Sobolev space $W^{s, \infty} ([0, 1]^{d})$ with $s > 0$ . We prove that constant-depth networks equipped with smooth activations achieve smoothness adaptivity: increasing width alone suffices to attain the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In contrast, for non-smooth activations such as ReLU, smoothness adaptivity is fundamentally limited by depth: the attainable approximation order is bounded by depth, and higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, complementary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning