Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
Alan Jeffares, Alicia Curth, Mihaela van der Schaar

TL;DR
This paper introduces a simple, telescoping model of neural networks that provides practical insights into phenomena like double descent and grokking, aiding understanding of neural network behavior and design choices.
Contribution
It presents a novel, empirically validated model that simplifies neural network analysis and reveals parallels with gradient boosting, enhancing interpretability and predictive capabilities.
Findings
The model predicts neural network performance and phenomena.
It uncovers parallels between neural networks and gradient boosting.
It offers a pedagogical framework for analyzing training dynamics.
Abstract
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network consisting of a sequence of first-order approximations telescoping out into a single empirically operational tool for practical analysis. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena in the literature -- including double descent, grokking, linear mode connectivity, and the challenges of applying deep learning on tabular data -- highlighting that this model allows us to construct and extract metrics that help predict and understand the a priori unexpected performance of neural networks. We also demonstrate that this model presents a pedagogical formalism allowing us to isolate components…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Architecture and Computational Design · Neural Networks and Reservoir Computing
