Neural Network Layer Matrix Decomposition reveals Latent Manifold   Encoding and Memory Capacity

Ng Shyh-Chang; A-Li Luo; Bo Qiu

arXiv:2309.05968·cs.LG·September 13, 2023

Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

Ng Shyh-Chang, A-Li Luo, Bo Qiu

PDF

Open Access

TL;DR

This paper proves a converse to the universal approximation theorem, showing neural network weights encode functions approximating training data, and uses matrix decomposition to reveal the geometric structure of learned representations and their role in memory capacity.

Contribution

It introduces Layer Matrix Decomposition (LMD) to analyze neural networks, linking eigen-decomposition to data encoding, memory, and recent neural network models.

Findings

01

Neural network weights encode functions approximating training data.

02

Matrix decomposition reveals the geometric structure of latent spaces.

03

Memory capacity and expressivity are interconnected in neural networks.

Abstract

We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for truncated singular value decomposition of the weight matrix for every NN layer, we can illuminate the nature of the latent space manifold of the training dataset encoded and represented by every NN layer, and the geometric nature of the mathematical operations performed by each NN layer. Our results have implications for understanding how NNs break the curse of dimensionality by harnessing memory capacity for expressivity, and that the two are complementary. This Layer Matrix Decomposition (LMD)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings