Class-Conditional Neural Polarizer: A Lightweight and Effective Backdoor Defense by Purifying Poisoned Features
Mingli Zhu, Shaokui Wei, Hongyuan Zha, Baoyuan Wu

TL;DR
This paper introduces CNPD, a lightweight, class-conditional backdoor defense that effectively purifies poisoned features by integrating a neural polarizer with class-aware mechanisms, improving robustness without requiring label estimation.
Contribution
The paper proposes CNPD, a novel class-conditional neural polarizer-based defense that enhances backdoor mitigation by incorporating class information, overcoming limitations of previous methods like NPD.
Findings
CNPD effectively reduces backdoor effects in neural networks.
Class-conditional mechanisms improve purification accuracy.
The approach maintains benign performance while defending against backdoors.
Abstract
Recent studies have highlighted the vulnerability of deep neural networks to backdoor attacks, where models are manipulated to rely on embedded triggers within poisoned samples, despite the presence of both benign and trigger information. While several defense methods have been proposed, they often struggle to balance backdoor mitigation with maintaining benign performance.In this work, inspired by the concept of optical polarizer-which allows light waves of specific polarizations to pass while filtering others-we propose a lightweight backdoor defense approach, NPD. This method integrates a neural polarizer (NP) as an intermediate layer within the compromised model, implemented as a lightweight linear transformation optimized via bi-level optimization. The learnable NP filters trigger information from poisoned samples while preserving benign content. Despite its effectiveness, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Wireless Signal Modulation Classification
MethodsSoftmax · Attention Is All You Need
