To be Robust or to be Fair: Towards Fairness in Adversarial Training
Han Xu, Xiaorui Liu, Yaxin Li, Anil K. Jain, Jiliang Tang

TL;DR
This paper investigates the fairness issues in adversarial training, revealing disparities in robustness across data groups, and proposes a framework to mitigate this unfairness while maintaining robustness.
Contribution
It uncovers the unfairness problem in adversarial training and introduces a novel Fair-Robust-Learning framework to address this issue.
Findings
Adversarial training causes accuracy and robustness disparities between data groups.
Theoretical analysis explains the root of fairness issues in adversarial training.
FRL effectively reduces disparities without sacrificing robustness.
Abstract
Adversarial training algorithms have been proved to be reliable to improve machine learning models' robustness against adversarial examples. However, we find that adversarial training algorithms tend to introduce severe disparity of accuracy and robustness between different groups of data. For instance, a PGD adversarially trained ResNet18 model on CIFAR-10 has 93% clean accuracy and 67% PGD l-infty-8 robust accuracy on the class "automobile" but only 65% and 17% on the class "cat". This phenomenon happens in balanced datasets and does not exist in naturally trained models when only using clean samples. In this work, we empirically and theoretically show that this phenomenon can happen under general adversarial training algorithms which minimize DNN models' robust errors. Motivated by these findings, we propose a Fair-Robust-Learning (FRL) framework to mitigate this unfairness problem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
