Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization
Mahdi Haghifam, Borja Rodr\'iguez-G\'alvez, Ragnar Thobaben, Mikael, Skoglund, Daniel M. Roy, Gintare Karolina Dziugaite

TL;DR
This paper demonstrates that existing information-theoretic bounds are insufficient for establishing minimax rates in stochastic convex optimization with gradient descent, highlighting the need for new analytical approaches.
Contribution
The paper shows the limitations of current information-theoretic frameworks in deriving minimax rates for gradient descent in stochastic convex optimization.
Findings
Existing bounds cannot establish minimax rates.
No analysis of noisy surrogate algorithms yields minimax rates.
New methods are needed for information-theoretic analysis of gradient descent.
Abstract
To date, no "information-theoretic" frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy "surrogate" algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Statistical Methods and Inference
MethodsNone
