Cauchy-Schwarz Fairness Regularizer
Yezi Liu, Hanning Chen, Wenjun Huang, Yang Ni, Mohsen Imani

TL;DR
This paper introduces a Cauchy-Schwarz fairness regularizer that improves group fairness in machine learning by providing tighter bounds and better stability across tasks, outperforming existing methods in fairness metrics.
Contribution
The paper proposes a novel Cauchy-Schwarz divergence-based regularizer for fairness, with theoretical advantages and empirical improvements over prior regularizers.
Findings
Consistently improves Demographic Parity and Equal Opportunity metrics.
Achieves a more stable utility-fairness trade-off across hyperparameters.
Outperforms existing regularizers in diverse datasets.
Abstract
Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers are built on heterogeneous distance measures and design choices, which makes their behavior hard to reason about and their performance inconsistent across tasks. This raises a basic question: what properties make a good fairness regularizer? We address this question by first organizing existing in-process methods into three families: (i) matching prediction statistics across sensitive groups, (ii) aligning latent representations, and (iii) directly minimizing dependence between predictions and sensitive attributes. Through this lens, we identify desirable properties of the underlying distance measure, including tight generalization bounds, robustness to scale differences, and the ability to handle…
Peer Reviews
Decision·Submitted to ICLR 2025
1. Leveraging Cauchy-Schwarz divergence as a fairness regularizer is a contribution with theoretical advantages over traditional metrics like DP, KL, and MMD. 2. The experiments on diverse datasets, including both tabular and image data, strengthen the evidence of the CS regularizer's effectiveness across domains and tasks. 3. The evaluation and comparison are comprehensive and convicing.
1. The paper begins by highlighting the challenge that many fairness regularizers can achieve DP but fail to address EO effectively. It sets the expectation that the proposed method will tackle this inconsistency between fairness definitions. However, CS divergence is more naturally aligned with DP rather than EO. In theory, it is unclear how CS achieves EO. 2. The theoretical properties and guarantees come from the nature of CS divergence rather than the fair regularizer, making this paper
This paper addresses the important topic of fairness in supervised learning, which is highly relevant to the conference's focus areas. The empirical evaluations show that the proposed method achieves superior fairness in terms of equalized odds while simultaneously preserving demographic parity, compared to several existing methods, including FairMixup, MMD, HSIC, and the prejudice remover.
The paper omits several significant existing works, leading to some ambiguity in its contributions. First, demographic parity and equalized odds are conflicting fairness definitions, meaning that no predictor with reasonable accuracy can satisfy both demographic parity and equalized odds simultaneously. This issue has been studied in: - Kleinberg et al., "Inherent Trade-Offs in the Fair Determination of Risk Scores," ITCS'17. Building on this, the motivation of the paper is questionable. Whil
- The proposed regularizer achieves lower DP and EO simultaneously, which mostly beats existing methods. Also, the proposed regularizer has minimal sacrifice in terms of utility, such as accuracy and AOC. - Although CS divergence is a known method for different areas(domain adaptation), it is a novel idea to apply it for fairness. - The paper has a comprehensive experimental results section. They do consider several different perspectives, accuracy and fairness metrics, the tradeoff between acc
- The justification in Section 3.2 is unclear. While the authors argue that CS divergence is more effective for fairness, their supporting points seem to describe general properties of CS divergence rather than directly connecting these to fairness goals like DP and EO. Also, the proof of Proposition 2 is not clear and I think that it misses a few steps. The same proposition exists in [1], it could be just cited. - The authors focus exclusively on DP and EO in their experiments, despite the abst
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
