Approximation with SiLU Networks: Constant Depth and Exponential Rates for Basic Operations

Koffi O. Ayena

arXiv:2512.12132·cs.LG·February 24, 2026

Approximation with SiLU Networks: Constant Depth and Exponential Rates for Basic Operations

Koffi O. Ayena

PDF

Open Access

TL;DR

This paper demonstrates how properly tuned SiLU neural networks can efficiently approximate functions like $x^2$ with constant depth and exponential rates, emphasizing the importance of hyperparameter optimization.

Contribution

It introduces SiLU network constructions that achieve approximation with constant depth and exponential rates by optimal hyperparameter tuning, extending to Sobolev spaces.

Findings

01

Two-layer networks approximate $x^2$ with error $oldsymbol{ ext{ε}}$ using weights scaling as $eta^{ ext{±}k}$.

02

Networks with depth $oldsymbol{ ext{O}(1)}$ and $oldsymbol{ ext{O}( ext{ε}^{-d/n})}$ parameters approximate Sobolev functions.

03

Hyperparameter tuning critically influences the approximation efficiency of SiLU networks.

Abstract

We present SiLU network constructions whose approximation efficiency depends critically on proper hyperparameter tuning. For the square function $x^{2}$ , with optimally chosen shift $a$ and scale $β$ , we achieve approximation error $ε$ using a two-layer network of constant width, where weights scale as $β^{\pm k}$ with $k = O (ln (1/ ε))$ . We then extend this approach through functional composition to Sobolev spaces, we obtain networks with depth $O (1)$ and $O (ε^{- d / n})$ parameters under optimal hyperparameters settings. Our work highlights the trade-off between architectural depth and activation parameter optimization in neural network approximation theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications