FAIR-TAT: Improving Model Fairness Using Targeted Adversarial Training
Tejaswini Medi, Steffen Jung, Margret Keuper

TL;DR
This paper introduces FAIR-TAT, a novel targeted adversarial training method that improves model fairness and robustness against adversarial attacks and corruptions, addressing fairness disparities in class-wise robustness.
Contribution
The paper proposes a targeted adversarial training approach that enhances fairness and robustness, outperforming traditional untargeted methods in adversarial settings.
Findings
Targeted adversarial training improves fairness trade-offs.
FAIR-TAT enhances robustness against diverse adversarial threats.
Empirical results show increased fairness and robustness in models.
Abstract
Deep neural networks are susceptible to adversarial attacks and common corruptions, which undermine their robustness. In order to enhance model resilience against such challenges, Adversarial Training (AT) has emerged as a prominent solution. Nevertheless, adversarial robustness is often attained at the expense of model fairness during AT, i.e., disparity in class-wise robustness of the model. While distinctive classes become more robust towards such adversaries, hard to detect classes suffer. Recently, research has focused on improving model fairness specifically for perturbed images, overlooking the accuracy of the most likely non-perturbed data. Additionally, despite their robustness against the adversaries encountered during model training, state-of-the-art adversarial trained models have difficulty maintaining robustness and fairness when confronted with diverse adversarial threats…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
