On the Power of Differentiable Learning versus PAC and SQ Learning

Emmanuel Abbe; Pritish Kamath; Eran Malach; Colin Sandon; Nathan; Srebro

arXiv:2108.04190·cs.LG·February 8, 2022·1 cites

On the Power of Differentiable Learning versus PAC and SQ Learning

Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan, Srebro

PDF

Open Access 1 Video

TL;DR

This paper investigates the capabilities of stochastic gradient descent and batch gradient descent in learning neural networks, showing how their power relates to PAC and SQ learning depending on gradient precision and batch size.

Contribution

It establishes conditions under which SGD and GD can simulate PAC learning, extending prior results and clarifying the impact of gradient precision and batch size on learning power.

Findings

01

SGD can simulate PAC learning with sufficient gradient precision.

02

GD can also simulate PAC learning given enough sample precision.

03

When precision is limited, SGD's power reduces to SQ learning.

Abstract

We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $ρ$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b ρ$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b = 1$ . Similarly, with fine enough precision relative to the sample size $m$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Power of Differentiable Learning versus PAC and SQ Learning· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent