Nested Learning: The Illusion of Deep Learning Architectures
Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni

TL;DR
This paper introduces Nested Learning, a new paradigm that models deep learning architectures as multi-level, nested optimization problems, enabling more expressive algorithms and improved continual learning capabilities.
Contribution
It proposes a novel framework called Nested Learning, including expressive optimizers, a self-modifying sequence model, and a continuum memory system, advancing the understanding of deep learning architectures.
Findings
Self-modifying sequence model learns to adapt its own update rules.
Continuum memory system enhances continual learning and long-context reasoning.
Promising results in language modeling and few-shot tasks.
Abstract
Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve, and find effective solutions. In this paper, we present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel optimization problems, each of which with its own context flow. Through the lenses of NL, existing deep learning methods learns from data through compressing their own context flow, and in-context learning naturally emerges in large models. NL suggests a philosophy to design more expressive learning algorithms with more levels, resulting in higher-order in-context learning and potentially unlocking effective continual learning capabilities. We advocate for NL by presenting three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks
