BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

Jiaqi Xue; Qian Lou; Mengxin Zheng

arXiv:2410.17492·cs.CR·October 24, 2024

BadFair: Backdoored Fairness Attacks with Group-conditioned Triggers

Jiaqi Xue, Qian Lou, Mengxin Zheng

PDF

Open Access

TL;DR

BadFair is a novel backdoored attack that manipulates fairness in models, remaining stealthy under normal conditions but causing targeted discrimination when triggered, with high success rates and minimal accuracy loss.

Contribution

We introduce BadFair, a new backdoor attack method that specifically targets fairness mechanisms, exposing vulnerabilities and bypassing existing fairness detection techniques.

Findings

01

Achieves over 85% attack success rate on target groups

02

Maintains high model accuracy with minimal loss

03

Consistently causes significant discrimination in targeted groups

Abstract

Attacking fairness is crucial because compromised models can introduce biased outcomes, undermining trust and amplifying inequalities in sensitive applications like hiring, healthcare, and law enforcement. This highlights the urgent need to understand how fairness mechanisms can be exploited and to develop defenses that ensure both fairness and robustness. We introduce BadFair, a novel backdoored fairness attack methodology. BadFair stealthily crafts a model that operates with accuracy and fairness under regular conditions but, when activated by certain triggers, discriminates and produces incorrect results for specific groups. This type of attack is particularly stealthy and dangerous, as it circumvents existing fairness detection methods, maintaining an appearance of fairness in normal use. Our findings reveal that BadFair achieves a more than 85% attack success rate in attacks aimed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Advanced Malware Detection Techniques