TL;DR
This paper introduces a novel algorithm called DRAF for subgroup fairness in AI, effectively handling multiple sensitive attributes by focusing on significant subgroups and reducing computational complexity.
Contribution
The paper proposes a new learning algorithm that addresses computational and data sparsity issues in subgroup fairness with multiple sensitive attributes, using a surrogate fairness measure.
Findings
DRAF outperforms baseline methods on benchmark datasets.
The surrogate fairness gap bounds the supremum IPM (supIPM).
Effective in scenarios with many sensitive attributes and small subgroups.
Abstract
Algorithmic fairness is a socially crucial topic in real-world applications of AI. Among many notions of fairness, subgroup fairness is widely studied when multiple sensitive attributes (e.g., gender, race, age) are present. However, as the number of sensitive attributes grows, the number of subgroups increases accordingly, creating heavy computational burdens and data sparsity problem (subgroups with too small sizes). In this paper, we develop a novel learning algorithm for subgroup fairness which resolves these issues by focusing on subgroups with sufficient sample sizes as well as marginal fairness (fairness for each sensitive attribute). To this end, we formalize a notion of subgroup-subset fairness and introduce a corresponding distributional fairness measure called the supremum Integral Probability Metric (supIPM). Building on this formulation, we propose the Doubly…
Peer Reviews
Decision·ICLR 2026 Poster
S1. The motivation is clear, that the subgroup grows exponentially and that there could be data sparsity in subgroups that make the fairness measure not statistically meaningful. S2. Theoretical contents are in general clear to me (though I have questions listed in weaknesses part).
W1. A main argument of this paper is that we should use active subgroups to study subgroup-subset fairness. But from Figure 10, it seems that with varying $\gamma$, the model doesn't seem to have much difference in performance. (There indeed are differences but just not very significant to me.) Could the authors provide more justifications on the impact of inactive subgroups for learning the fair model? W2. Following W1, the connection between Theorem 3.1 + Section A.4 and $|W| \geq \gamma n$ i
1. Computational convenience: the paper proposes a novel metric for subgroup fairness, which only considers a representative subset of sensitive attributes, significantly bringing down the dimensions. 2. Sound theoretical contribution: the authors have a rather complete theoretical analysis of their methods.
1. Experiments compare to REG, GerryFair, and a sequential post-processor, but these methods don't seem to get extensively discussed in the related work section, e.g., how the proposed method is different from [1]? Also, for multiple sensitive attributes especially in binary classification, there've also been some other literature around, e.g., [2] on binary classification with DP/EO constraints and [3] on EO with adversarial training fashion. 2. “Subgroup-subset fairness” depends on the chosen
The paper’s major strength lies in its theoretical innovation coupled with practical feasibility. The authors do not simply modify existing fairness metrics but instead derive a new conceptual layer, which is subgroup-subset fairness. It realistically balances statistical soundness and social relevance. This idea acknowledges that enforcing fairness over all possible intersectional subgroups is statistically impossible, and instead directs fairness efforts toward meaningful, estimable subsets.
While the theoretical development is solid, the paper’s presentation of empirical limitations could be more transparent. Although DRAF achieves strong results on several datasets, the analysis does not deeply explore why it performs better in certain settings or whether there are trade-offs when subgroup distributions overlap significantly. A more detailed analysis of failure cases would make the argument more comprehensive. For example, when subgroup definitions are noisy or when marginal fairn
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
