Chunking: Continual Learning is not just about Distribution Shift
Thomas L. Lee, Amos Storkey

TL;DR
This paper highlights the importance of the chunking sub-problem in continual learning, showing it accounts for significant performance drops and is currently unaddressed by existing algorithms, thereby limiting overall CL effectiveness.
Contribution
The paper identifies chunking as a critical and neglected sub-problem in continual learning, demonstrating its impact and proposing that addressing it can improve CL performance.
Findings
Chunking accounts for about half of the performance drop in CL.
Current CL algorithms perform no better than SGD without distribution shift.
Addressing chunking improves performance transfer to shifted data scenarios.
Abstract
Work on continual learning (CL) has thus far largely focused on the problems arising from shifts in the data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem, the chunking of data. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. Therefore, we show that chunking is both an important and currently unaddressed sub-problem and until it is addressed CL methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent
