Statistical-mechanical analysis of pre-training and fine tuning in deep learning
Masayuki Ohzeki

TL;DR
This paper uses statistical mechanics to analyze how pre-training and fine-tuning in deep learning influence model performance, revealing phase transitions dependent on unlabeled data volume.
Contribution
It introduces a statistical-mechanical framework to understand pre-training and fine-tuning, highlighting phase transitions in generalization error based on unlabeled data.
Findings
Identifies a phase transition in generalization error related to unlabeled data quantity.
Demonstrates the efficacy of unsupervised pre-training through a replica method analysis.
Validates theoretical results with belief propagation algorithms.
Abstract
In this paper, we present a statistical-mechanical analysis of deep learning. We elucidate some of the essential components of deep learning---pre-training by unsupervised learning and fine tuning by supervised learning. We formulate the extraction of features from the training data as a margin criterion in a high-dimensional feature-vector space. The self-organized classifier is then supplied with small amounts of labelled data, as in deep learning. Although we employ a simple single-layer perceptron model, rather than directly analyzing a multi-layer neural network, we find a nontrivial phase transition that is dependent on the number of unlabelled data in the generalization error of the resultant classifier. In this sense, we evaluate the efficacy of the unsupervised learning component of deep learning. The analysis is performed by the replica method, which is a sophisticated tool in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
