Towards Poisoning Fair Representations
Tianci Liu, Haoyu Wang, Feijie Wu, Hengtong Zhang, Pan Li, Lu Su, Jing, Gao

TL;DR
This paper introduces the first data poisoning attack framework targeting fair representation learning models, demonstrating how adversaries can induce unfair representations and analyzing defenses through theoretical and empirical evaluations.
Contribution
It presents a novel poisoning attack specifically designed for fair representation learning, including an approximation method and theoretical analysis of attack requirements.
Findings
The attack effectively induces unfair representations in FRL models.
Theoretical bounds on the number of poisoning samples needed.
Experimental results show the attack outperforms baseline methods.
Abstract
Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data…
Peer Reviews
Decision·ICLR 2024 poster
+ Poisoning fair representations have received less attention in the research literature on data poisoning and preliminary works show that these attacks can have a significant impact on the fairness of the target algorithms. Exploring more scalable poisoning attack strategies capable of increasing the fairness gap for deep neural networks is timely and a topic of interest. + The authors strived to provide a theoretical analysis on the ratio of poisoning points required to compromise the target
+ The paper lacks a clear threat model. For example, it is unclear whether the attacker’s objective is useful for compromising algorithms with and without mechanisms for mitigating the fairness gap. On the other side, it is unclear what is the attacker’s objective and the relation of the attack strategy with respect to the model’s performance. Other works in the research literature, like Chang et al. or Van et al. (“Poisoning Attacks in Fair Machine Learning”) have already considered the trade-o
This is a pioneering data poisoning attack on deep learning-based fair representation learning to degrade fairness while the existing fairness attacks are focusing on shallow-model classifiers. The authors propose a new attack goal based on MI to amplify the difference between representations from different subgroups. The authors derive the first theoretical minimal number of poisoning samples required by their attack, which is crucial for practical attacks.
The assumption of the threat model is strong. The proposed attack is under the assumption of a white-box threat model, where the attacker has full access to and control over the victim's trained model. This implies that the attack is primarily effective in scenarios where the victim has already trained a model and relies on the attacker's data for subsequent fine-tuning. Such a specific condition might limit the general applicability of the attack in diverse real-world scenarios. Lack the reaso
1/ This is the first research effort in organising data poisoning attacks against fair representation learning attacks. Different from fair classification problems, manipulating fair representation needs to control the statistical relation between high-dimensional embeddings and raw feature inputs. This is challenging for directly extending previous fair learning poisoning methods. I'd appreciate the efforts poured towards this difficult problem. 2/ It is intuitive to increase the mutual info
One of the problem of introducing elastic penalty is how to choose properly the two penalty parameters $\lambda_{1}$ and $\lambda_2$. Though it can be chosen empirically, it can be dataset-dependent. Would it make significantly difference if we simply choose the L1 norm penalty instead?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutopsy Techniques and Outcomes · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
