Context Channel Capacity: An Information-Theoretic Framework for Understanding Catastrophic Forgetting
Ran Cheng

TL;DR
This paper introduces the concept of Context Channel Capacity ($C_{ctx}$) as an information-theoretic measure to understand and predict catastrophic forgetting in continual learning architectures, validated through extensive experiments.
Contribution
It proposes a new theoretical framework using $C_{ctx}$ to explain forgetting, introduces HyperNetworks to bypass the Impossibility Triangle, and provides diagnostic tools and a taxonomy of research directions.
Findings
$C_{ctx}$ accurately predicts forgetting behavior across methods.
HyperNetworks achieve near-zero forgetting by redefining parameters.
Empirical validation on Split-MNIST and CIFAR-10 datasets.
Abstract
Catastrophic forgetting remains a central challenge in continual learning (CL), yet lacks a unified information-theoretic explanation for why some architectures forget catastrophically while others do not. We introduce \emph{Context Channel Capacity} (), the mutual information between a CL architecture's context signal and its generated parameters, and prove that zero forgetting requires , where is the task identity entropy. We establish an \emph{Impossibility Triangle} -- zero forgetting, online learning, and finite parameters cannot be simultaneously satisfied by sequential state-based learners -- and show that conditional regeneration architectures (HyperNetworks) bypass this triangle by redefining parameters as function values rather than states. We validate this framework across 8 CL methods on Split-MNIST (1,130+ experiments over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Memory Processes and Influences · Visual Attention and Saliency Detection
