Unified View of Grokking, Double Descent and Emergent Abilities: A   Perspective from Circuits Competition

Yufei Huang; Shengding Hu; Xu Han; Zhiyuan Liu; Maosong Sun

arXiv:2402.15175·cs.LG·February 27, 2024·1 cites

Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition

Yufei Huang, Shengding Hu, Xu Han, Zhiyuan Liu, Maosong Sun

PDF

Open Access

TL;DR

This paper presents a unified framework explaining grokking, double descent, and emergent abilities in neural networks through the competition between memorization and generalization circuits, extending to various model sizes and training data.

Contribution

It introduces a comprehensive framework that unifies understanding of multiple phenomena in deep learning, including new predictions and extensions to multi-task learning.

Findings

01

Detailed analysis of double descent phenomenon

02

Two verifiable predictions about double descent occurrence

03

Extension of framework to multi-task learning and emergent abilities

Abstract

Recent studies have uncovered intriguing phenomena in deep learning, such as grokking, double descent, and emergent abilities in large language models, which challenge human intuition and are crucial for a deeper understanding of neural models. In this paper, we present a comprehensive framework that provides a unified view of these three phenomena, focusing on the competition between memorization and generalization circuits. This approach, initially employed to explain grokking, is extended in our work to encompass a wider range of model sizes and training data volumes. Our framework delineates four distinct training dynamics, each depending on varying combinations of model size and training data quantity. Utilizing this framework, we provide a detailed analysis of the double descent phenomenon and propose two verifiable predictions regarding its occurrence, both substantiated by our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMerger and Competition Analysis