Can LLMs predict the convergence of Stochastic Gradient Descent?

Oussama Zekri; Abdelhakim Benechehab; Ievgen Redko

arXiv:2408.01736·cs.LG·August 6, 2024

Can LLMs predict the convergence of Stochastic Gradient Descent?

Oussama Zekri, Abdelhakim Benechehab, Ievgen Redko

PDF

Open Access

TL;DR

This paper investigates the ability of large language models to predict the convergence points of stochastic gradient descent in both convex and non-convex optimization, revealing promising zero-shot capabilities.

Contribution

It demonstrates that LLMs can predict SGD convergence points without prior training on specific optimization tasks, leveraging their understanding of Markovian dynamics.

Findings

01

LLMs show zero-shot prediction of SGD local minima.

02

Theoretical link between SGD and Markov chains supports predictions.

03

Potential for using LLMs in zero-shot analysis of deep learning models.

Abstract

Large-language models are notoriously famous for their impressive performance across a wide range of tasks. One surprising example of such impressive performance is a recently identified capacity of LLMs to understand the governing principles of dynamical systems satisfying the Markovian property. In this paper, we seek to explore this direction further by studying the dynamics of stochastic gradient descent in convex and non-convex optimization. By leveraging the theoretical link between the SGD and Markov chains, we show a remarkable zero-shot performance of LLMs in predicting the local minima to which SGD converges for previously unseen starting points. On a more general level, we inquire about the possibility of using LLMs to perform zero-shot randomized trials for larger deep learning models used in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCardiac Imaging and Diagnostics · Advanced X-ray and CT Imaging · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent