Partial train and isolate, mitigate backdoor attack

Yong Li; Han Gao

arXiv:2405.16488·cs.CV·June 7, 2024

Partial train and isolate, mitigate backdoor attack

Yong Li, Han Gao

PDF

Open Access

TL;DR

This paper introduces a new training method that isolates suspicious samples and fine-tunes models to effectively defend against backdoor attacks in neural networks.

Contribution

The paper proposes a partial training approach that isolates potential backdoor samples and enhances model robustness through fine-tuning, improving backdoor mitigation.

Findings

01

The method effectively isolates backdoor samples during training.

02

Fine-tuning improves model resistance to backdoor attacks.

03

The approach maintains high accuracy on normal samples.

Abstract

Neural networks are widely known to be vulnerable to backdoor attacks, a method that poisons a portion of the training data to make the target model perform well on normal data sets, while outputting attacker-specified or random categories on the poisoned samples. Backdoor attacks are full of threats. Poisoned samples are becoming more and more similar to corresponding normal samples, and even the human eye cannot easily distinguish them. On the other hand, the accuracy of models carrying backdoors on normal samples is no different from that of clean models.In this article, by observing the characteristics of backdoor attacks, We provide a new model training method (PT) that freezes part of the model to train a model that can isolate suspicious samples. Then, on this basis, a clean model is fine-tuned to resist backdoor attacks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques