FCL-ViT: Task-Aware Attention Tuning for Continual Learning

Anestis Kaimakamidis; Ioannis Pitas

arXiv:2412.02509·cs.AI·August 19, 2025

FCL-ViT: Task-Aware Attention Tuning for Continual Learning

Anestis Kaimakamidis, Ioannis Pitas

PDF

Open Access

TL;DR

FCL-ViT introduces a feedback mechanism in Vision Transformers for continual learning, dynamically tuning attention features to adapt to new tasks while outperforming existing methods.

Contribution

The paper proposes FCL-ViT, a novel feedback-based Vision Transformer architecture with task-aware attention tuning for improved continual learning performance.

Findings

01

Surpasses state-of-the-art in continual learning benchmarks.

02

Uses fewer trainable parameters than existing methods.

03

Effectively adapts attention features to new tasks.

Abstract

Continual Learning (CL) involves adapting the prior Deep Neural Network (DNN) knowledge to new tasks, without forgetting the old ones. However, modern CL techniques focus on provisioning memory capabilities to existing DNN models rather than designing new ones that are able to adapt according to the task at hand. This paper presents the novel Feedback Continual Learning Vision Transformer (FCL-ViT) that uses a feedback mechanism to generate real-time dynamic attention features tailored to the current task. The FCL-ViT operates in two Phases. In phase 1, the generic image features are produced and determine where the Transformer should attend on the current image. In phase 2, task-specific image features are generated that leverage dynamic attention. To this end, Tunable self-Attention Blocks (TABs) and Task Specific Blocks (TSBs) are introduced that operate in both phases and are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Absolute Position Encodings · Adam · Softmax · Label Smoothing · Dropout · Dense Connections · Layer Normalization · Linear Layer · Multi-Head Attention