Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

Ran Cheng

arXiv:2603.07415·cs.LG·March 10, 2026

Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting

Ran Cheng

PDF

Open Access

TL;DR

This paper introduces the concept of Context Channel Capacity ($C_{ctx}$) as an information-theoretic measure to understand and predict catastrophic forgetting in continual learning architectures, validated through extensive experiments.

Contribution

It proposes a new theoretical framework using $C_{ctx}$ to explain forgetting, introduces HyperNetworks to bypass the Impossibility Triangle, and provides diagnostic tools and a taxonomy of research directions.

Findings

01

$C_{ctx}$ accurately predicts forgetting behavior across methods.

02

HyperNetworks achieve near-zero forgetting by redefining parameters.

03

Empirical validation on Split-MNIST and CIFAR-10 datasets.

Abstract

Catastrophic forgetting remains a central challenge in continual learning (CL), yet lacks a unified information-theoretic explanation for why some architectures forget catastrophically while others do not. We introduce \emph{Context Channel Capacity} ( $C_{ctx}$ ), the mutual information between a CL architecture's context signal and its generated parameters, and prove that zero forgetting requires $C_{ctx} \geq H (T)$ , where $H (T)$ is the task identity entropy. We establish an \emph{Impossibility Triangle} -- zero forgetting, online learning, and finite parameters cannot be simultaneously satisfied by sequential state-based learners -- and show that conditional regeneration architectures (HyperNetworks) bypass this triangle by redefining parameters as function values rather than states. We validate this framework across 8 CL methods on Split-MNIST (1,130+ experiments over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Memory Processes and Influences · Visual Attention and Saliency Detection