Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor   Defense by Purifying Poisoned Features

Mingli Zhu; Shaokui Wei; Hongyuan Zha; Baoyuan Wu

arXiv:2502.18520·cs.CR·February 27, 2025

Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor Defense by Purifying Poisoned Features

Mingli Zhu, Shaokui Wei, Hongyuan Zha, Baoyuan Wu

PDF

Open Access

TL;DR

This paper introduces CNPD, a lightweight, class-conditional backdoor defense that effectively purifies poisoned features by integrating a neural polarizer with class-aware mechanisms, improving robustness without requiring label estimation.

Contribution

The paper proposes CNPD, a novel class-conditional neural polarizer-based defense that enhances backdoor mitigation by incorporating class information, overcoming limitations of previous methods like NPD.

Findings

01

CNPD effectively reduces backdoor effects in neural networks.

02

Class-conditional mechanisms improve purification accuracy.

03

The approach maintains benign performance while defending against backdoors.

Abstract

Recent studies have highlighted the vulnerability of deep neural networks to backdoor attacks, where models are manipulated to rely on embedded triggers within poisoned samples, despite the presence of both benign and trigger information. While several defense methods have been proposed, they often struggle to balance backdoor mitigation with maintaining benign performance.In this work, inspired by the concept of optical polarizer-which allows light waves of specific polarizations to pass while filtering others-we propose a lightweight backdoor defense approach, NPD. This method integrates a neural polarizer (NP) as an intermediate layer within the compromised model, implemented as a lightweight linear transformation optimized via bi-level optimization. The learnable NP filters trigger information from poisoned samples while preserving benign content. Despite its effectiveness, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Wireless Signal Modulation Classification

MethodsSoftmax · Attention Is All You Need