RECAST: Reparameterized, Compact weight Adaptation for Sequential Tasks
Nazia Tasnim, Bryan A. Plummer

TL;DR
RECAST introduces a highly parameter-efficient method for sequential task adaptation that leverages weight decomposition and a novel neural mimicry pipeline, enabling effective incremental learning with minimal trainable parameters.
Contribution
The paper presents RECAST, a new approach that significantly reduces task-specific parameters using soft weight sharing and neural mimicry, outperforming existing methods in incremental learning.
Findings
RECAST outperforms state-of-the-art methods by up to 3% on six datasets.
It reduces trainable parameters to fewer than 50, orders of magnitude less than competitors.
The approach is architecture-agnostic and easily integrates with existing methods.
Abstract
Incremental learning aims to adapt to new sets of categories over time with minimal computational overhead. Prior work often addresses this task by training efficient task-specific adaptors that modify frozen layer weights or features to capture relevant information without affecting predictions on previously learned categories. While these adaptors are generally more efficient than finetuning the entire network, they still require tens to hundreds of thousands of task-specific trainable parameters even for relatively small networks, making it challenging to operate on resource-constrained environments with high communication costs like edge devices or mobile phones. Thus, we propose Reparameterized, Compact weight Adaptation for Sequential Tasks (RECAST), a novel method that dramatically reduces task-specific trainable parameters to fewer than 50 - several orders of magnitude less than…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper studies an important task. - Motivating towards neural mimic and learnings from proxy domains are helpful in solving the CL tasks.
- Experiments are very small scale. Two target models are only ~21M in size (Ln 305), and does not demonstrate scalability to large models. - No forgetting values are presented that are critical aspects of CL learning setups. - Fig. 1 is misleading and hard to get the 2*10^-6 parameters - instead this is a ratio instead of parameters. Moreover, it's not thoroughly shown this is a parameter that scales across networks. Would it be the same for LLM? Otherwise stating the method yields <<<1% can be
The proposed idea of incorporating Neural Mimicry into continual learning is novel and interesting. By decomposing model weights into frozen template banks and focusing solely on learning coefficients, the approach significantly reduces the number of learnable parameters. This paper introduces a new pipeline and valuable insights for the continual learning community. Experiments on both CNN and ViT-small models demonstrate the effectiveness of the proposed method and highlight its architecture-
The writing in this paper could be improved. For the pseudo-code, it would enhance readability to introduce the notations alongside the code for easier understanding. Additionally, more details on how to obtain the pre-trained model weights should be clearly provided. The effectiveness of the method on larger models and datasets has not yet been adequately addressed.
1. The proposed method can reduce learnable parameter for incremental learning. 2. In the experiments, the proposed methods shows better accuracy with less trainable parameters when combining with previous incremental learning methods.
1. The paper is not well written and not well organized. For example: (1) The proposed method and related works are mixed together in Section 2. (2) Subsection 2.1 introduces too many concepts from later sections, making it difficult to follow. (3) The opening paragraphs of Subsections 2.1 and 2.2 are largely redundant. (4) The overview of the proposed RECAST method (Fig. 2) should be introduced at the beginning of Section 2, but it isn’t presented until Subsectio
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications
