The learning phases in NN: From Fitting the Majority to Fitting a Few

Johannes Schneider

arXiv:2202.08299·cs.LG·February 18, 2022

The learning phases in NN: From Fitting the Majority to Fitting a Few

Johannes Schneider

PDF

Open Access

TL;DR

This paper investigates the learning dynamics of deep neural networks, revealing a two-phase process involving initial input reconstruction improvement followed by targeted classification, supported by theoretical analysis and experiments on standard architectures.

Contribution

It introduces a new perspective on neural network learning phases by analyzing reconstruction and classification performance, challenging existing theories like the information bottleneck.

Findings

01

Identification of a prototyping phase with decreasing reconstruction loss

02

Subsequent phase focusing on classifying a few samples with increased reconstruction loss

03

Validation of the analysis on common computer vision architectures like ResNet and VGG

Abstract

The learning dynamics of deep neural networks are subject to controversy. Using the information bottleneck (IB) theory separate fitting and compression phases have been put forward but have since been heavily debated. We approach learning dynamics by analyzing a layer's reconstruction ability of the input and prediction performance based on the evolution of parameters during training. We show that a prototyping phase decreasing reconstruction loss initially, followed by reducing classification loss of a few samples, which increases reconstruction loss, exists under mild assumptions on the data. Aside from providing a mathematical analysis of single layer classification networks, we also assess the behavior using common datasets and architectures from computer vision such as ResNet and VGG.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Anomaly Detection Techniques and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · 1x1 Convolution · Global Average Pooling · Dropout · Convolution · Batch Normalization · Residual Connection · Bottleneck Residual Block · Residual Block