# Spectral Ergodicity in Deep Learning Architectures via Surrogate Random   Matrices

**Authors:** Mehmet S\"uzen, Cornelius Weber, Joan J. Cerd\`a

arXiv: 1704.08303 · 2017-09-07

## TL;DR

This paper introduces a new method combining TM and KL divergence metrics to quantify spectral ergodicity in random matrices, applied to neural network weight matrices, revealing that larger matrices exhibit higher spectral ergodicity which may influence deep learning success.

## Contribution

A novel approach to measure spectral ergodicity in random matrices using TM and KL divergence, applied to neural network models to potentially optimize architectures.

## Key findings

- Spectral ergodicity increases with matrix size.
- Eigenvalue spectra of single realizations approach ensemble averages as size grows.
- Spectral ergodicity may be key to deep learning effectiveness.

## Abstract

In this work a novel method to quantify spectral ergodicity for random matrices is presented. The new methodology combines approaches rooted in the metrics of Thirumalai-Mountain (TM) and Kullbach-Leibler (KL) divergence. The method is applied to a general study of deep and recurrent neural networks via the analysis of random matrix ensembles mimicking typical weight matrices of those systems. In particular, we examine circular random matrix ensembles: circular unitary ensemble (CUE), circular orthogonal ensemble (COE), and circular symplectic ensemble (CSE). Eigenvalue spectra and spectral ergodicity are computed for those ensembles as a function of network size. It is observed that as the matrix size increases the level of spectral ergodicity of the ensemble rises, i.e., the eigenvalue spectra obtained for a single realisation at random from the ensemble is closer to the spectra obtained averaging over the whole ensemble. Based on previous results we conjecture that success of deep learning architectures is strongly bound to the concept of spectral ergodicity. The method to compute spectral ergodicity proposed in this work could be used to optimise the size and architecture of deep as well as recurrent neural networks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.08303/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1704.08303/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1704.08303/full.md

---
Source: https://tomesphere.com/paper/1704.08303