Dynamic Universal Approximation Theory: Foundations for Parallelism in   Neural Networks

Wei Wang; Qing Li

arXiv:2407.21670·cs.LG·December 2, 2024

Dynamic Universal Approximation Theory: Foundations for Parallelism in Neural Networks

Wei Wang, Qing Li

PDF

Open Access

TL;DR

This paper introduces a parallel neural network architecture, Para-Former, based on a new theoretical foundation that allows inference speed to remain constant regardless of network depth, addressing a key limitation of traditional serial models.

Contribution

The paper develops a novel parallelization strategy grounded in an extended Universal Approximation Theorem, enabling deep networks to operate with constant inference time regardless of layers.

Findings

01

Para-Former significantly accelerates inference in deep networks.

02

Inference time remains constant as network depth increases.

03

Experimental validation confirms the effectiveness of the parallelization approach.

Abstract

Neural networks are increasingly evolving towards training large models with big data, a method that has demonstrated superior performance across many tasks. However, this approach introduces an urgent problem: current deep learning models are predominantly serial, meaning that as the number of network layers increases, so do the training and inference times. This is unacceptable if deep learning is to continue advancing. Therefore, this paper proposes a deep learning parallelization strategy based on the Universal Approximation Theorem (UAT). From this foundation, we designed a parallel network called Para-Former to test our theory. Unlike traditional serial models, the inference time of Para-Former does not increase with the number of layers, significantly accelerating the inference speed of multi-layer networks. Experimental results validate the effectiveness of this network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings