Reducing catastrophic forgetting of incremental learning in the absence of rehearsal memory with task-specific token
Young Jo Choi, Min Kyoon Yoo, Yu Rang Park

TL;DR
This paper introduces a novel task-specific token method in vision transformers to mitigate catastrophic forgetting in incremental learning without storing past data, ensuring privacy and security.
Contribution
The proposed approach uses task-specific tokens to encapsulate knowledge, enabling effective incremental learning without rehearsal memory, and demonstrates superior performance on benchmark datasets.
Findings
Achieved highest accuracy among compared methods
Lowest backward transfer indicating minimal forgetting
Effective knowledge preservation without data rehearsal
Abstract
Deep learning models generally display catastrophic forgetting when learning new data continuously. Many incremental learning approaches address this problem by reusing data from previous tasks while learning new tasks. However, the direct access to past data generates privacy and security concerns. To address these issues, we present a novel method that preserves previous knowledge without storing previous data. This method is inspired by the architecture of a vision transformer and employs a unique token capable of encapsulating the compressed knowledge of each task. This approach generates task-specific embeddings by directing attention differently based on the task associated with the data, thereby effectively mimicking the impact of having multiple models through tokens. Our method incorporates a distillation process that ensures efficient interactions even after multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Ferroelectric and Negative Capacitance Devices
MethodsAttention Is All You Need · Softmax · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Residual Connection · Vision Transformer
