Deep Learning Through A Telescoping Lens: A Simple Model Provides   Empirical Insights On Grokking, Gradient Boosting & Beyond

Alan Jeffares; Alicia Curth; Mihaela van der Schaar

arXiv:2411.00247·cs.LG·November 4, 2024

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond

Alan Jeffares, Alicia Curth, Mihaela van der Schaar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a simple, telescoping model of neural networks that provides practical insights into phenomena like double descent and grokking, aiding understanding of neural network behavior and design choices.

Contribution

It presents a novel, empirically validated model that simplifies neural network analysis and reveals parallels with gradient boosting, enhancing interpretability and predictive capabilities.

Findings

01

The model predicts neural network performance and phenomena.

02

It uncovers parallels between neural networks and gradient boosting.

03

It offers a pedagogical framework for analyzing training dynamics.

Abstract

Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network consisting of a sequence of first-order approximations telescoping out into a single empirically operational tool for practical analysis. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena in the literature -- including double descent, grokking, linear mode connectivity, and the challenges of applying deep learning on tabular data -- highlighting that this model allows us to construct and extract metrics that help predict and understand the a priori unexpected performance of neural networks. We also demonstrate that this model presents a pedagogical formalism allowing us to isolate components…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alanjeffares/telescoping-lens
noneOfficial

Videos

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Architecture and Computational Design · Neural Networks and Reservoir Computing