Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations
Yuhao Liu, Zilin Wang, Lei Wu, Shaobo Zhang

TL;DR
This paper demonstrates that neural networks with smooth activation functions can adapt to the smoothness of target functions, achieving optimal approximation and estimation rates with constant depth by increasing width, unlike ReLU networks which require depth growth.
Contribution
It establishes the fundamental role of activation smoothness in enabling optimal rates in neural networks, providing a multi-scale approximation framework for analysis.
Findings
Smooth activations enable width-only adaptivity to smoothness.
ReLU networks require depth growth for higher smoothness.
Explicit neural network approximators with controlled complexity are constructed.
Abstract
Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we study both approximation and statistical properties of neural networks with smooth activations for learning functions in the Sobolev space with . We prove that constant-depth networks equipped with smooth activations achieve smoothness adaptivity: increasing width alone suffices to attain the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In contrast, for non-smooth activations such as ReLU, smoothness adaptivity is fundamentally limited by depth: the attainable approximation order is bounded by depth, and higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
