Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz; Meisam Razaviyayn; Peilin Zhong; Vahab Mirrokni

arXiv:2512.24695·cs.LG·January 1, 2026

Nested Learning: The Illusion of Deep Learning Architectures

Ali Behrouz, Meisam Razaviyayn, Peilin Zhong, Vahab Mirrokni

PDF

Open Access 1 Models

TL;DR

This paper introduces Nested Learning, a new paradigm that models deep learning architectures as multi-level, nested optimization problems, enabling more expressive algorithms and improved continual learning capabilities.

Contribution

It proposes a novel framework called Nested Learning, including expressive optimizers, a self-modifying sequence model, and a continuum memory system, advancing the understanding of deep learning architectures.

Findings

01

Self-modifying sequence model learns to adapt its own update rules.

02

Continuum memory system enhances continual learning and long-context reasoning.

03

Promising results in language modeling and few-shot tasks.

Abstract

Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, self-improve, and find effective solutions. In this paper, we present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel optimization problems, each of which with its own context flow. Through the lenses of NL, existing deep learning methods learns from data through compressing their own context flow, and in-context learning naturally emerges in large models. NL suggests a philosophy to design more expressive learning algorithms with more levels, resulting in higher-order in-context learning and potentially unlocking effective continual learning capabilities. We advocate for NL by presenting three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
GurkeBaui/Karla
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Graph Neural Networks