Unlocking [CLS] Features for Continual Post-Training

Murat Onur Yildirim; Elif Ceren Gok Yildirim; Joaquin Vanschoren

arXiv:2502.14762·cs.LG·February 20, 2026

Unlocking [CLS] Features for Continual Post-Training

Murat Onur Yildirim, Elif Ceren Gok Yildirim, Joaquin Vanschoren

PDF

Open Access

TL;DR

This paper introduces TOSCA, a parameter-efficient method that enhances continual learning by adapting only the [CLS] token with sparse calibration, achieving state-of-the-art results with significantly fewer parameters.

Contribution

The paper proposes TOSCA, a novel sparse calibration approach that fine-tunes only the [CLS] token for continual learning, balancing stability and plasticity efficiently.

Findings

01

TOSCA achieves state-of-the-art performance in continual learning tasks.

02

It uses approximately 8 times fewer parameters than previous methods.

03

The approach maintains foundation model generalization while adapting effectively.

Abstract

Continual learning requires models to integrate new classes or domains over time while preserving previously acquired knowledge. Within this paradigm, foundation models often achieve strong performance, but they still remain subject to the stability-plasticity trade-off, where excessive plasticity leads to forgetting of prior knowledge, and excessive stability constrains the adaptation. This necessitates an effective post-training strategy that introduces minimal yet functional modifications. To address this challenge, we first introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire task-specific knowledge through an adapter-calibrator couple, enabling well-refined feature representations. Then, for each task, we deploy a sparse LuCA module on top of the last classification token [CLS] just before the classifier, which we refer to as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Assessment and Pedagogy