Attack On Prompt: Backdoor Attack in Prompt-Based Continual Learning
Trang Nguyen, Anh Tran, Nhat Ho

TL;DR
This paper reveals a backdoor attack method on prompt-based continual learning models, demonstrating how adversaries can implant triggers that cause models to behave maliciously while maintaining normal performance on clean data.
Contribution
It introduces a novel backdoor attack framework tailored for prompt-based continual learning, addressing transferability, resiliency, and authenticity challenges with effective solutions.
Findings
Achieves up to 100% attack success rate in experiments.
Demonstrates robustness of backdoor triggers during incremental learning.
Validates effectiveness across various datasets and models.
Abstract
Prompt-based approaches offer a cutting-edge solution to data privacy issues in continual learning, particularly in scenarios involving multiple data suppliers where long-term storage of private user data is prohibited. Despite delivering state-of-the-art performance, its impressive remembering capability can become a double-edged sword, raising security concerns as it might inadvertently retain poisoned knowledge injected during learning from private user data. Following this insight, in this paper, we expose continual learning to a potential threat: backdoor attack, which drives the model to follow a desired adversarial target whenever a specific trigger is present while still performing normally on clean samples. We highlight three critical challenges in executing backdoor attacks on incremental learners and propose corresponding solutions: (1) \emph{Transferability}: We employ a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGeophysical Methods and Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
