Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning
Katarzyna Filus, Kamil Faber, Roberto Corizzo, Christopher Kanan

TL;DR
This paper introduces a diagnostic framework using Sparse Autoencoders to analyze concept-level forgetting in continual learning, revealing that much forgetting relates to reduced accessibility rather than information loss.
Contribution
It proposes a novel SAE-based method to disentangle and analyze concept-level forgetting, providing insights into the internal representational changes during continual learning.
Findings
Much concept forgetting can be recovered assuming linearity.
Decodability of concepts degrades with more tasks.
Forgetting often relates to accessibility, not complete information loss.
Abstract
Continual learning studies how models can adapt to new tasks while retaining previously acquired knowledge. Although a broad spectrum of methods has been proposed to mitigate catastrophic forgetting, the field remains predominantly performance-driven, with limited insight into what forgetting actually corresponds to within the vision model's representation space. Prior work has primarily analyzed forgetting through task-level performance or coarse measures of representational drift, without disentangling output-level accessibility from changes in finer-grained internal structure. To this end, we propose a diagnostic framework that leverages Sparse Autoencoders (SAEs) to define a task-anchored latent feature space, enabling analysis of how task-specific information evolves at a finer granularity, where individual SAE latents are treated as concept proxies for recurring and relatively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
