LCA: Local Classifier Alignment for Continual Learning

Tung Tran; Danilo Vasconcellos Vargas; Khoat Than

arXiv:2603.09888·cs.AI·March 12, 2026

LCA: Local Classifier Alignment for Continual Learning

Tung Tran, Danilo Vasconcellos Vargas, Khoat Than

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Local Classifier Alignment (LCA), a novel loss function that improves continual learning by aligning classifiers with the backbone, leading to better generalization and robustness across multiple tasks.

Contribution

The paper proposes the LCA loss to address classifier-backbone mismatch in continual learning, enhancing performance and robustness with theoretical and empirical validation.

Findings

01

LCA improves classifier alignment and generalization.

02

The method achieves state-of-the-art results on standard benchmarks.

03

LCA enhances robustness in continual learning scenarios.

Abstract

A fundamental requirement for intelligent systems is the ability to learn continuously under changing environments. However, models trained in this regime often suffer from catastrophic forgetting. Leveraging pre-trained models has recently emerged as a promising solution, since their generalized feature extractors enable faster and more robust adaptation. While some earlier works mitigate forgetting by fine-tuning only on the first task, this approach quickly deteriorates as the number of tasks grows and the data distributions diverge. More recent research instead seeks to consolidate task knowledge into a unified backbone, or adapting the backbone as new tasks arrive. However, such approaches may create a (potential) \textit{mismatch} between task-specific classifiers and the adapted backbone. To address this issue, we propose a novel \textit{Local Classifier Alignment} (LCA) loss to…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- Paper is well written and easy to follow. Especially, the authors tried to provide multiple visualisations to help the understanding of the readers. - Instead of just giving empirical evidence, the authors also provide theoretical analysis as well

Weaknesses

- My major concern about this paper is about novelty. To me, it looks like none of the proposed component is novel not just in entire deep learning literature but also in continual learning literature. For example, classifier alignment was studied even from few years ago as in [R1, R2]. Adopting model merging to continual learning is not a novel idea as well as in [R3]. I can not see the distinct novelty of the proposed method compared to the above cited papers. - Since the pretrained model is

Reviewer 02Rating 6Confidence 2

Strengths

1. Applying model merging methods to CL is an emerging topic and has potential to reduce forgetting for large scale models. 2. The effectiveness of the LCA loss is justified both theoretically and experimentally. 3. The paper conducts thorough experiments with comparison to recent CL baselines. The visualization of results is good.

Weaknesses

1. It is unclear how LCA loss solves the mismatch between classifiers and the merged feature extractor. - The LCA loss is computed only on in-task samples, reducing $\epsilon_i$ in Eq. 4. However, the mismatch between classifier and feature extractor seems aiming to reduce $L(\mathbf D, h_t)$. Although LCA loss improves in-task robustness and reduces the loss upperbound, it’s unclear how it addresses the mismatch across tasks. - It could be helpful to show more evidence of the claim ‘LCA can r

Reviewer 03Rating 4Confidence 4

Strengths

[S1] The paper clearly identifies and addresses a critical practical problem in PTM-based CIL methods: backbone-classifier misalignment that occurs when incrementally updating the backbone. This is a timely and important contribution to the CIL field. [S2] Robustness and Generality: Demonstrates enhanced model stability through robustness tests and shows LCA can be complementarily applied to improve other CIL methods, proving its general applicability.

Weaknesses

[W1] Since the robustness penalty is a core contribution, this ablation is essential to distinguish its specific benefit from the classifier alignment component (first term), which SLCA also employs. Without this experiment, it remains unclear whether the improvements stem from the novel robustness penalty or merely from classifier alignment. I strongly encourage the authors to include this ablation, as it would clearly demonstrate the added value over existing approaches like SLCA and strengthe

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face recognition and analysis · Advanced Neural Network Applications