Subverting Fair Image Search with Generative Adversarial Perturbations
Avijit Ghosh, Matthew Jagielski, Christo Wilson

TL;DR
This paper demonstrates that fairness-aware image search engines can be manipulated by adversarial perturbations, causing unfair ranking biases without access to the model or data, highlighting vulnerabilities in fair machine learning systems.
Contribution
The study introduces a novel attack method using Generative Adversarial Perturbations to subvert fairness in image search rankings, revealing critical robustness issues.
Findings
Attacks significantly favor the majority class in rankings
Perturbations have minimal impact on search relevance
Attacks are effective under strict threat models
Abstract
In this work we explore the intersection fairness and robustness in the context of ranking: when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data? To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
