Benchmarking Stochastic Approximation Algorithms for Fairness-Constrained Training of Deep Neural Networks
Andrii Kliachkin, Jana Lep\v{s}ov\'a, Gilles Bareilles, Jakub Mare\v{c}ek

TL;DR
This paper introduces a comprehensive benchmark for fairness-constrained training of deep neural networks using stochastic approximation algorithms, highlighting theoretical challenges and comparing recent methods on real-world data.
Contribution
It provides the first large-scale benchmark for fairness-constrained DNN training and compares three recent algorithms, offering insights into their performance and fairness improvements.
Findings
Benchmark reveals performance differences among algorithms
Demonstrates the effectiveness of certain algorithms in fairness improvement
Provides a publicly available Python package for future research
Abstract
The ability to train Deep Neural Networks (DNNs) with constraints is instrumental in improving the fairness of modern machine-learning models. Many algorithms have been analysed in recent years, and yet there is no standard, widely accepted method for the constrained training of DNNs. In this paper, we provide a challenging benchmark of real-world large-scale fairness-constrained learning tasks, built on top of the US Census (Folktables). We point out the theoretical challenges of such tasks and review the main approaches in stochastic approximation algorithms. Finally, we demonstrate the use of the benchmark by implementing and comparing three recently proposed, but as-of-yet unimplemented, algorithms both in terms of optimization performance, and fairness improvement. We release the code of the benchmark as a Python package at https://github.com/humancompatible/train.
Peer Reviews
Decision·ICLR 2026 Poster
1. This work provides a reproducible and extensible benchmark framework for fairness-constrained deep learning, filling a gap in the literature where no unified platform existed. 2. The writing is very clear, and the notations are consistent. I appreciate Table 3, where the authors review a wide range of stochastic constrained optimization algorithms with a structured taxonomy and theoretical assumptions. 3. The work evaluates multiple fairness criteria, independence, separation, sufficiency, a
1. The paper primarily implements existing algorithms rather than introducing a new one. While benchmarking is valuable, this may limit perceived theoretical contribution. 2. Only one dataset with a binary protected attribute is used. The scalability and generalization to multiple attributes have not been tested. 3. The presentation of the experimental results can be improved. The current figures are difficult to read. 4. There is no discussion of hyperparameter search across algorithms. Why are
-- The paper has included a lot of experiments on different algorithmic (stochastic) variants of implementing fairness as a constraint during training. Methods considered include: (i) Stochastic ghost method; (ii) Stochastic smoothed and linearized AL method; (iii) Stochastic switching subgradient method, etc. -- They consider the three popular fairness notions, and also multiple datasets. -- Presentation is generally good.
-- While the vast experiments are highly appreciated, I believe this paper is more suitable as a dataset/benchmark paper. The stochastic optimization algorithms already exist in the literature and have also been used for constraint optimization. The paper applies these constrained optimization variants for the specific constraint of group fairness and studies their performance. -- Indeed, the paper is quite comprehensive in their experimentation. But, still, the novelty would be limited for su
1- Evaluates four distinct fairness-constrained optimization algorithms under identical experimental setups, providing valuable comparative insights. 2- Offers a transparent, well-engineered implementation with all datasets, hyperparameters, and metrics clearly documented. 3- Bridges fairness theory with realistic deep learning setups, enabling reproducible fairness experiments on real data. 4- Covers three key fairness notions (independence, separation, sufficiency) and links them to optimiz
1- Experiments are mainly conducted on a single dataset (ACSIncome), which restricts the scope of empirical validation. Inclusion of more varied domains (e.g., image or language tasks) would strengthen the claim of generality. 2- While results are reported, deeper analysis of when and why certain algorithms perform better (e.g., under which fairness metrics or subgroup imbalances) is missing. 3- Although billions of potential subgroup combinations are mentioned, the experiments do not convinci
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques
