Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo, Hyunseo Koh, Wonje Jeung, Minjae Lee, San Kim, Hankook, Lee, Sungjun Cho, Sungik Choi, Hyunwoo Kim, Jonghyun Choi

TL;DR
This paper introduces a novel online continual learning approach leveraging neural collapse to form equiangular representations, enabling single-epoch training to effectively adapt to streamed data and outperform existing methods.
Contribution
The paper proposes a new method that induces neural collapse to improve online continual learning efficiency and performance with minimal training epochs.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Effective in various online learning scenarios including disjoint and boundary-free setups.
Demonstrates the benefit of neural collapse in representation learning for continual learning.
Abstract
Online continual learning suffers from an underfitted solution due to insufficient training for prompt model update (e.g., single-epoch training). To address the challenge, we propose an efficient online continual learning method using the neural collapse phenomenon. In particular, we induce neural collapse to form a simplex equiangular tight frame (ETF) structure in the representation space so that the continuously learned model with a single epoch can better fit to the streamed data by proposing preparatory data training and residual correction in the representation space. With an extensive set of empirical validations using CIFAR-10/100, TinyImageNet, ImageNet-200, and ImageNet-1K, we show that our proposed method outperforms state-of-the-art methods by a noticeable margin in various online continual learning scenarios such as disjoint and Gaussian scheduled continuous (i.e.,…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The paper was an interesting read. - The paper does a good job of introducing the concepts of neural collapse and equiangular tight frames.
- I think there might not be sufficient novelty for publication at a top venue like ICLR. - I am not sure whether the construction of preparatory data is sound. Why would the rotation of an image change its class? - The comparison with other methods is probably not completely fair, since EARL uses up more memory resources (due the storage of feature-residual pairs). - Moreover, EARL is more expensive than a simple classifier at inference time, due to it performing residual correction. I don't k
1. Inducing the neural collapse to accelerate fitting newly arrived data is a novel approach in CIL. In the experiment, the authors show the effectiveness of using ETF classifier with preparatory data training and storing the residual information by showing the degradation of cosine similarity between the features, and also increased the performance.
1. Though it is hard to achieve remarkable performance in online continual learning scenario with large-scale datasets (e.g. ImageNet-1K), to strengthen the results, it would be better to carry out the large-scale dataset experiment with the proposed algorithm.
The manuscript in general is clearly articulated and addresses a relevant problem. The combination of the proposed three-component approach appears to be the primary contribution of the paper. Furthermore, inducing neural collapse in continual learning prior to reaching the TPT phase while minimizing perturbation of the old class poses a challenging aspect.
The introduction section could benefit from further refinement to enhance clarity. It would be helpful if the central phenomenon driving the paper was elaborated upon more explicitly. For example, the relationship between Continual Learning as an inherently imbalanced problem and the role of Neural Collapse in addressing unbalanced data could be made clearer, especially in the context of new classes being biased towards the features of older classes. Specifically, the manuscript mentions: “the p
1. The paper is well-written and handles a challenging variant of continual learning, online continual learning. 2. Empirical evaluations on various datasets demonstrates the efficacy of the proposed approach in mitigating catastrophic forgetting over existing continual learning baselines.
I am mainly concerned about empirical evaluation and scalability of the proposed method. 1. This method employs memory only training, however, it might create negative impact in the performances of the existing continual learning baselines. Therefore, I would suggest the authors to train the baselines following the steps mentioned in the respecting proposed baselines, like REMIND, ER-MIR, EWC, DER++. 2. The proposed method uses KNN during inference which could be a time consuming process with
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Experimental Learning in Engineering · Indoor and Outdoor Localization Technologies
MethodsSparse Evolutionary Training
