Learning with Preserving for Continual Multitask Learning
Hanchen David Wang, Siwoo Bae, Zirong Chen, Meiyi Ma

TL;DR
This paper introduces Learning with Preserving (LwP), a novel continual multitask learning framework that maintains the geometric structure of shared representations to prevent forgetting and improve performance without needing replay buffers.
Contribution
LwP shifts focus from output preservation to geometric structure preservation in shared representations, introducing a DWDP loss for effective continual multitask learning.
Findings
LwP outperforms state-of-the-art methods on time-series and image benchmarks.
LwP demonstrates robustness to distribution shifts.
LwP surpasses single-task learning baselines in continual learning scenarios.
Abstract
Artificial intelligence systems in critical fields like autonomous driving and medical imaging analysis often continually learn new tasks using a shared stream of input data. For instance, after learning to detect traffic signs, a model may later need to learn to classify traffic lights or different types of vehicles using the same camera feed. This scenario introduces a challenging setting we term Continual Multitask Learning (CMTL), where a model sequentially learns new tasks on an underlying data distribution without forgetting previously learned abilities. Existing continual learning methods often fail in this setting because they learn fragmented, task-specific features that interfere with one another. To address this, we introduce Learning with Preserving (LwP), a novel framework that shifts the focus from preserving task outputs to maintaining the geometric structure of the…
Peer Reviews
Decision·Submitted to ICLR 2025
1. A new continual learning setting is introduced. 2. A method tailored to the new setting is designed.
1. Compared to general CL scenarios, such as CIL, DIL (domainincremental learning), TIL (task incremental learning), the proposed CMTL setting can indeed be seen as an idealized simplified version. CMTL lets the input data come from the same distribution, which means that all tasks are performed on the same data domain, without considering the case where the data distribution drifts over time. In real-world applications, the data distribution of subsequent tasks may differ from the previous ones
1. The paper proposes a new scenario of continual learning, CMTL, highlighting its unique challenges and significance in practical applications. 2. The LwP framework is innovative in preserving previously learned knowledge in a way that remains applicable and beneficial across diverse tasks. 3. The experimental results suggest that LwP demonstrates competitive performance compared to existing continual learning methods.
1. How does the proposed method address the fundamental challenges in continual learning, such as catastrophic forgetting or the stability-plasticity dilemma? 2. The Dynamically Weighted Distance Preservation (DWDP) loss is an innovative contribution. However, it would be valuable to delve deeper into the theoretical foundations of DWDP, exploring its relationship to other distance-preserving techniques and providing additional insights into why it is effective for preserving implicit knowledge.
Performance on all the benchmarks are impressive. Figure 5 clearly shows the minimal loss in performance in the previous tasks as the learning progress. the benifits of Learning with Preserving (LwP) loss as a regulaization is very solid, and can be seen on the figure 5 and table 1, and compared to other appoches LwP performs considerably well. the evaulation is done with a good coverage, with 3 vision benchmarks, and show the distributions of these latents in t-sne plots. the paper also mea
It is not clear, how this CMTL problem is novel, it is same as in early LwF papers, and the paper claims this is one of the contibutions. please adress this in the rebuttal. while the results are impressive, i am bit scaptical on the scale of the datasets, all have been trained on smaller scale and low resolution. would be nice to show some results on larger resolution images and models. Also would be nice to show that this approch can work for other archituctres like vit. I belive it should w
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
