Task-Agnostic Detector for Insertion-Based Backdoor Attacks

Weimin Lyu; Xiao Lin; Songzhu Zheng; Lu Pang; Haibin Ling; Susmit Jha,; Chao Chen

arXiv:2403.17155·cs.CL·March 27, 2024·1 cites

Task-Agnostic Detector for Insertion-Based Backdoor Attacks

Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha,, Chao Chen

PDF

Open Access

TL;DR

This paper introduces TABDet, a task-agnostic backdoor detection method for NLP that uses final layer logits and pooling to effectively identify backdoors across various NLP tasks, surpassing traditional task-specific approaches.

Contribution

The paper presents TABDet, a novel task-agnostic detection approach that unifies logit representations across multiple NLP tasks, improving backdoor detection effectiveness.

Findings

01

TABDet outperforms traditional task-specific detection methods.

02

It effectively detects backdoors in diverse NLP tasks.

03

The method is efficient and task-agnostic.

Abstract

Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Network Security and Intrusion Detection