FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes
Haonan Wang, Ziwei Wu, Jingrui He

TL;DR
FAIRIF is a two-stage training method that improves fairness in deep learning models by reweighting data based on small validation set annotations, without altering the original model structure.
Contribution
It introduces a novel, model-agnostic approach to mitigate bias using influence functions and validation set attributes, requiring minimal modifications and small validation data.
Findings
Better fairness-utility trade-offs demonstrated on synthetic data.
Effective bias mitigation on real-world datasets.
Scalability and applicability to pretrained models shown.
Abstract
Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance the model performance across different demographic groups (first stage). FAIRIF can be applied on a wide range of models trained by stochastic gradient descent without changing the model, while only requiring group annotations on a small validation set to compute sample weights. Theoretically, we show that, in the classification setting, three notions of disparity among different groups can be mitigated by training with the weights. Experiments on synthetic data sets demonstrate that FAIRIF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
