Opening the Black Box: predicting the trainability of deep neural   networks with reconstruction entropy

Yanick Thurn; Ro Jefferson; Johanna Erdmenger

arXiv:2406.12916·cs.LG·December 23, 2024

Opening the Black Box: predicting the trainability of deep neural networks with reconstruction entropy

Yanick Thurn, Ro Jefferson, Johanna Erdmenger

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to predict the trainability of deep neural networks early in training by reconstructing inputs from activations, significantly reducing training time across various architectures and datasets.

Contribution

The authors propose a novel input reconstruction entropy measure to predict network trainability after just one epoch, applicable to multiple neural network types.

Findings

01

Reconstruction entropy correlates with network trainability.

02

Method reduces training time by predicting untrainable networks early.

03

Applicable to DNNs, ResNets, and CNNs.

Abstract

An important challenge in machine learning is to predict the initial conditions under which a given neural network will be trainable. We present a method for predicting the trainable regime in parameter space for deep feedforward neural networks (DNNs) based on reconstructing the input from subsequent activation layers via a cascade of single-layer auxiliary networks. We show that a single epoch of training of the shallow cascade networks is sufficient to predict the trainability of the deep feedforward network on a range of datasets (MNIST, CIFAR10, FashionMNIST, and white noise), thereby providing a significant reduction in overall training time. We achieve this by computing the relative entropy between reconstructed images and the original inputs, and show that this probe of information loss is sensitive to the phase behaviour of the network. We further demonstrate that this method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YanickT/Infoflow-paper
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsDense Connections · Feedforward Network