MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting   and Attention Distillation

Yidong Ding; Jiafei Niu; Ping Yi

arXiv:2501.02754·cs.CR·January 7, 2025

MBTSAD: Mitigating Backdoors in Language Models Based on Token Splitting and Attention Distillation

Yidong Ding, Jiafei Niu, Ping Yi

PDF

Open Access

TL;DR

MBTSAD is a novel method that mitigates backdoors in language models using token splitting and attention distillation, effective without pre-trained weights, and improves model robustness by generating OOD data.

Contribution

Proposes MBTSAD, a backdoor mitigation approach that does not require pre-trained weights and utilizes token splitting and attention distillation for enhanced robustness.

Findings

01

Achieves comparable backdoor mitigation performance to pre-trained weight methods.

02

Effectively eliminates backdoor patterns while maintaining clean data performance.

03

Generates OOD data that helps the model learn generalized features.

Abstract

In recent years, attention-based models have excelled across various domains but remain vulnerable to backdoor attacks, often from downloading or fine-tuning on poisoned datasets. Many current methods to mitigate backdoors in NLP models rely on the pre-trained (unfine-tuned) weights, but these methods fail in scenarios where the pre-trained weights are not available. In this work, we propose MBTSAD, which can mitigate backdoors in the language model by utilizing only a small subset of clean data and does not require pre-trained weights. Specifically, MBTSAD retrains the backdoored model on a dataset generated by token splitting. Then MBTSAD leverages attention distillation, the retrained model is the teacher model, and the original backdoored model is the student model. Experimental results demonstrate that MBTSAD achieves comparable backdoor mitigation performance as the methods based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare

MethodsSoftmax · Attention Is All You Need