On the Power of Differentiable Learning versus PAC and SQ Learning
Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan, Srebro

TL;DR
This paper investigates the capabilities of stochastic gradient descent and batch gradient descent in learning neural networks, showing how their power relates to PAC and SQ learning depending on gradient precision and batch size.
Contribution
It establishes conditions under which SGD and GD can simulate PAC learning, extending prior results and clarifying the impact of gradient precision and batch size on learning power.
Findings
SGD can simulate PAC learning with sufficient gradient precision.
GD can also simulate PAC learning given enough sample precision.
When precision is limited, SGD's power reduces to SQ learning.
Abstract
We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision of the gradient calculations relative to the minibatch size (for SGD) and sample size (for GD). With fine enough precision relative to minibatch size, namely when is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for . Similarly, with fine enough precision relative to the sample size ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
