Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity
Ng Shyh-Chang, A-Li Luo, Bo Qiu

TL;DR
This paper proves a converse to the universal approximation theorem, showing neural network weights encode functions approximating training data, and uses matrix decomposition to reveal the geometric structure of learned representations and their role in memory capacity.
Contribution
It introduces Layer Matrix Decomposition (LMD) to analyze neural networks, linking eigen-decomposition to data encoding, memory, and recent neural network models.
Findings
Neural network weights encode functions approximating training data.
Matrix decomposition reveals the geometric structure of latent spaces.
Memory capacity and expressivity are interconnected in neural networks.
Abstract
We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for truncated singular value decomposition of the weight matrix for every NN layer, we can illuminate the nature of the latent space manifold of the training dataset encoded and represented by every NN layer, and the geometric nature of the mathematical operations performed by each NN layer. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity, and that the two are complementary. This Layer Matrix Decomposition (LMD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings
