The Benefits of Reusing Batches for Gradient Descent in Two-Layer   Networks: Breaking the Curse of Information and Leap Exponents

Yatin Dandi; Emanuele Troiani; Luca Arnaboldi; Luca Pesce; Lenka; Zdeborov\'a; and Florent Krzakala

arXiv:2402.03220·stat.ML·September 6, 2024·2 cites

The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka, Zdeborov\'a, and Florent Krzakala

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that multi-pass gradient descent on two-layer neural networks can learn a broader class of functions more efficiently than single-pass methods, overcoming previous theoretical limitations.

Contribution

It introduces a novel analysis of multi-pass GD showing it surpasses single-pass GD in learning complex functions, using Dynamical Mean-Field Theory.

Findings

01

Multi-pass GD achieves rapid learning of functions not satisfying staircase property.

02

Reusing batches overcomes limitations imposed by information and leap exponents.

03

Theoretical results are supported by numerical experiments.

Abstract

We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the limitations of gradient flow and single-pass GD given by the information exponent (Ben Arous et al., 2021) and leap exponent (Abbe et al., 2023) of the target function. We show that upon re-using batches, the network achieves in just two time steps an overlap with the target subspace even for functions not satisfying the staircase property (Abbe et al., 2021). We characterize the (broad) class of functions efficiently learned in finite time. The proof of our results is based on the analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idephics/benefit-reusing-batch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optical Network Technologies · Network Traffic and Congestion Control · Stochastic Gradient Optimization Techniques

MethodsFocus