Statistical-mechanical analysis of pre-training and fine tuning in deep   learning

Masayuki Ohzeki

arXiv:1501.04413·stat.ML·June 23, 2015

Statistical-mechanical analysis of pre-training and fine tuning in deep learning

Masayuki Ohzeki

PDF

TL;DR

This paper uses statistical mechanics to analyze how pre-training and fine-tuning in deep learning influence model performance, revealing phase transitions dependent on unlabeled data volume.

Contribution

It introduces a statistical-mechanical framework to understand pre-training and fine-tuning, highlighting phase transitions in generalization error based on unlabeled data.

Findings

01

Identifies a phase transition in generalization error related to unlabeled data quantity.

02

Demonstrates the efficacy of unsupervised pre-training through a replica method analysis.

03

Validates theoretical results with belief propagation algorithms.

Abstract

In this paper, we present a statistical-mechanical analysis of deep learning. We elucidate some of the essential components of deep learning---pre-training by unsupervised learning and fine tuning by supervised learning. We formulate the extraction of features from the training data as a margin criterion in a high-dimensional feature-vector space. The self-organized classifier is then supplied with small amounts of labelled data, as in deep learning. Although we employ a simple single-layer perceptron model, rather than directly analyzing a multi-layer neural network, we find a nontrivial phase transition that is dependent on the number of unlabelled data in the generalization error of the resultant classifier. In this sense, we evaluate the efficacy of the unsupervised learning component of deep learning. The analysis is performed by the replica method, which is a sophisticated tool in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.