Floating-Point Networks with Automatic Differentiation Can Represent Almost All Floating-Point Functions and Their Gradients
Sejun Park, Yeachan Park, Geonho Hwang

TL;DR
This paper demonstrates that floating-point neural networks combined with automatic differentiation can approximate almost all functions and their gradients, even with finite precision and common activation functions.
Contribution
It proves that neural networks with floating-point arithmetic can represent arbitrary functions and gradients, extending theoretical results to practical finite-precision settings.
Findings
Floating-point networks can approximate arbitrary functions and gradients.
Automatic differentiation computes gradients that can also be arbitrarily represented.
Results hold for common activation functions like ReLU, ELU, GeLU, Swish, Sigmoid, and tanh.
Abstract
Theoretical studies show that for any differentiable function on a compact domain, there exists a neural network that approximates both the function values and gradients. However, such a result cannot be used in practice since it assumes real parameters and exact internal operations. In contrast, real implementations only use a finite subset of reals and machine operations with round-off errors. In this work, we investigate whether a similar result holds for neural networks under floating-point arithmetic, when the gradient with respect to the input is computed by the automatic differentiation algorithm . We first show that given a floating-point function (e.g., a loss function), arbitrary function values and gradients can be represented by a floating-point network and , respectively. We further extend this result: given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
