Projected Subnetworks Scale Adaptation
Siddhartha Datta, Nigel Shadbolt

TL;DR
This paper introduces a method for updating large models by manipulating projected subnetworks, enabling them to retain performance on previous and new tasks in zero/few-shot settings.
Contribution
It proposes a novel approach to maintain large model performance across multiple tasks by adjusting parameter updates as projected subnetworks.
Findings
Improved retention of seen task performance in large models.
Enhanced zero-shot and few-shot task capabilities after updates.
Effective in online learning scenarios.
Abstract
Large models support great zero-shot and few-shot capabilities. However, updating these models on new tasks can break performance on previous seen tasks and their zero/few-shot unseen tasks. Our work explores how to update zero/few-shot learners such that they can maintain performance on seen/unseen tasks of previous tasks as well as new tasks. By manipulating the parameter updates of a gradient-based meta learner as the projected task-specific subnetworks, we show improvements for large models to retain seen and zero/few shot task performance in online settings.
Peer Reviews
Decision·Submitted to ICLR 2024
- interesting and exciting topic - interesting results comparing CLIP and MAML - extensive empirical results
- I believe there are some typos in equations, or otherwise the descriptions are not clear - Writing clarity, structure and flow need to be improved - the setup seems compromised as the accuracy in unseen datasets is used to guide model training - the paper is written as a technical report describing what was done. for a scientific paper, I would expect a lot more high-level - motivatioon of design and theory supporting the algorithm development. - The paper claims that the subnetwork represent
- The authors present some interesting ideas and insights on the behavior of parameter drift in meta-learning scenarios and how to potentially use distance regularization in this space - They also try to improve clarity by providing their algorithms in pseudo code; While there definitely are some interesting points to this method, there is significant further work that needs to be done by the authors on their manuscript; → Please see Weaknesses section.
While I can see that the authors are following a potentially interesting idea and I am aware that some of the ‘issues’ might be due to language difficulties, the current state of the manuscript does in my opinion not warrant publication due to several severe issues; Please see below. --- ### Clarity of story line, goal and contributions: The paper is unfortunately rather hard to read and follow; Even reading through the abstract is somewhat confusing and it doesn’t entirely become clear what t
* The experimental results are quite convincing once the third task is reached. Before that, performance closely resembles that of PAINT-style adaptation. * The suite of adaptation experiments is quite expansive and covers a dozen of common evaluation benchmarks.
The primary issue with this paper comes from its presentation and writing, which is incredibly difficult to parse, lacks detail and often motivation and descriptions for experiments and components. In particular: * First, and most importantly, the proposed SNP/SNP++ approach is never really formalized, with the base-text just references large, ad-hoc algorithm chunks without providing any actionable information on the method details, and what actually happens. Similarly, subnetworks, as used an
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
