Censoring Representations with an Adversary
Harrison Edwards, Amos Storkey

TL;DR
This paper introduces an adversarial method to learn data representations that are free from sensitive information or annotations, ensuring fairness and privacy in machine learning applications.
Contribution
It proposes a flexible adversarial framework formulated as a minimax problem to remove sensitive or private information from data representations, outperforming previous methods.
Findings
Statistically significant improvement in fairness over previous methods
Effective removal of private information from images
Applicable to unaligned training data without prior knowledge
Abstract
In practice, there are often explicit constraints on what representations or decisions are acceptable in an application of machine learning. For example it may be a legal requirement that a decision must not favour a particular group. Alternatively it can be that that representation of data must not have identifying information. We address these two related issues by learning flexible representations that minimize the capability of an adversarial critic. This adversary is trying to predict the relevant sensitive variable from the representation, and so minimizing the performance of the adversary ensures there is little or no information in the representation about the sensitive variable. We demonstrate this adversarial approach on two problems: making decisions free from discrimination and removing private information from images. We formulate the adversarial model as a minimax problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
