Phase transitions in the mini-batch size for sparse and dense two-layer   neural networks

Raffaele Marino; Federico Ricci-Tersenghi

arXiv:2305.06435·cond-mat.dis-nn·January 17, 2024·2 cites

Phase transitions in the mini-batch size for sparse and dense two-layer neural networks

Raffaele Marino, Federico Ricci-Tersenghi

PDF

Open Access

TL;DR

This paper investigates how the mini-batch size affects the training and generalization of two-layer neural networks, revealing phase transitions where performance sharply changes at a critical batch size.

Contribution

It provides a quantitative analysis of mini-batch size effects in neural networks, identifying phase transitions in learning performance based on statistical mechanics concepts.

Findings

01

Generalization performance depends strongly on mini-batch size

02

Sharp phase transitions occur at a critical mini-batch size

03

Training failure or success is determined by crossing this critical size

Abstract

The use of mini-batches of data in training artificial neural networks is nowadays very common. Despite its broad usage, theories explaining quantitatively how large or small the optimal mini-batch size should be are missing. This work presents a systematic attempt at understanding the role of the mini-batch size in training two-layer neural networks. Working in the teacher-student scenario, with a sparse teacher, and focusing on tasks of different complexity, we quantify the effects of changing the mini-batch size $m$ . We find that often the generalization performances of the student strongly depend on $m$ and may undergo sharp phase transitions at a critical value $m_{c}$ , such that for $m < m_{c}$ the training process fails, while for $m > m_{c}$ the student learns perfectly or generalizes very well the teacher. Phase transitions are induced by collective phenomena firstly discovered in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications