Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval

Hang Thi-Thuy Le; Long Minh Bui; Minh Hoang; Trong Nghia Hoang

arXiv:2603.13235·cs.LG·March 17, 2026

Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval

Hang Thi-Thuy Le, Long Minh Bui, Minh Hoang, Trong Nghia Hoang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel parameter-adaptation method for continual fine-tuning that combines adaptive input embedding use with parameter-free retrieval, backed by theoretical error bounds and effective experimental results.

Contribution

It proposes a new method that integrates adaptive input embeddings with parameter-free retrieval, supported by theoretical guarantees and improved performance in continual learning.

Findings

01

Theoretical error bounds relate retrieval accuracy to task cluster structure.

02

The method improves retrieval and prediction under large task shifts.

03

Experimental results demonstrate superior performance over existing approaches.

Abstract

Continual fine-tuning aims to adapt a pre-trained backbone to new tasks sequentially while preserving performance on earlier tasks whose data are no longer available. Existing approaches fall into two categories which include input- and parameter-adaptation. Input-adaptation methods rely on retrieving the most relevant prompts at test time, but require continuously learning a retrieval function that is prone to forgetting. Parameter-adaptation methods instead use a fixed input embedding function to enable retrieval-free prediction and avoid forgetting, but sacrifice representation adaptability. To combine their best strengths, we propose a new parameter-adaptation method that enables adaptive use of input embeddings during test time with parameter-free retrieval. We derive task-retrieval error bounds for a clustering-based, parameter-free paradigm, providing theoretical guarantees that…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 2

Strengths

- The authors try to mathematically justify the proposed approach - The proposed method outperforms the baselines

Weaknesses

- The paper is poorly written and difficult to follow, even after multiple readings. It would benefit from substantial restructuring, rewriting, and clearer explanations throughout. - The distinction between input-adaptation and parameter-adaptation is unclear. From the paper, it seems that input-adaptation corresponds to prompt-tuning and parameter-adaptation to LoRA-based fine-tuning, but these categories are only briefly mentioned in the abstract and never clearly defined or elaborated upon

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper provides a novel application of Gaussian Mlixutre Models on CF. Improving prior state of the art in CFT with LoRA-based adapters. 2. The paper provides a solid theoretical backing for its approach. 3. The paper provides empirical evidence of the consistently higher performance of their method compared to previous state of the art across a diverse set of datasets.

Weaknesses

1. The paper claims to alleviate key issues with lack of representation adaptability and forgetting in retrieval-based methods. However, the latter is a prominent problem in prompt-based methods, not in parametric CFT methods (as PROTEUS) and previous parametric CFT approaches (RanPAC, InfLoRA, SD-LoRA) already address this with parameter-free retrieval methods.

Reviewer 03Rating 6Confidence 4

Strengths

1. Clearly identifies the problem of retriever forgetting in prompt/parameter-pool based CFT methods and proposes a novel parameter-free retrieval mechanism as a direct solution. 2. Theoretical Foundation: Provides a non-trivial theoretical analysis linking the retrieval error rate to geometric properties (cluster separation factor $\delta$) of the learned representation signatures. This offers valuable insight and principled guidance for the algorithmic design. Theorem 3.4 and 3.5 are signifi

Weaknesses

1. High System Complexity: The overall PROTEUS framework is quite intricate, involving adaptive LoRA with orthogonality constraints, non-parametric GMM fitting (DP-GMM) for potentially many components per task, storing these GMM parameters as signatures, computing likelihoods against all signatures during retrieval, and finally performing LDA prediction. This complexity raises concerns about implementation difficulty, computational overhead (especially GMM fitting and retrieval), and potential f

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications