TL;DR
This paper introduces a method to control fairness in machine learning by limiting mutual information between data representations and protected attributes, ensuring equitable outcomes with theoretical guarantees and improved practical performance.
Contribution
It presents a novel contrastive information estimation approach to effectively control fairness, outperforming variational bounds and providing strong theoretical guarantees.
Findings
Outperforms variational bounds in controlling parity
Provides stronger theoretical guarantees on fairness
Achieves more informative representations at various parity thresholds
Abstract
Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the ability of a particular adversary who seeks to maximize parity. Unfortunately, representations produced by adversarial approaches may still retain biases as their efficacy is tied to the complexity of the adversary used during training. To this end, we theoretically establish that by limiting the mutual information between representations and protected attributes, we can assuredly control the parity of any downstream classifier. We demonstrate an effective method for controlling parity through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
