TL;DR
This paper introduces an adaptive stopping algorithm that certifies when further search for less discriminatory algorithms is unlikely to yield meaningful improvements, aiding compliance and fairness efforts.
Contribution
It formalizes the search for less discriminatory algorithms as an optimal stopping problem and provides a method to certify the sufficiency of search efforts.
Findings
The algorithm provides high-probability bounds on potential fairness improvements.
Stronger assumptions lead to tighter bounds on the search process.
Validated on real-world credit and housing datasets.
Abstract
U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is very easy to follow, despite containing a lot of theory. To accomplish this, the manuscript starts from an ideal scenario :known population risk $Q$ and known marginal distribution of said risk $P_0$. Then each assumption is subsequently relaxed. First, acknowledging that $P_0$ is unknown means that we must find an upper bound on expected improvement that holds with high probability. Further relaxing our knowledge of $Q$ to that of its empirical counterpart $\widehat{Q}$ is then exp
## Extending Experiments While the presented experiments highlight that the algorithm is correct (the upper bounds holds with probability at least 95% in general), they could be extended to highlight interesting trade-offs and other applications beyond fairness. For instance, the appendix presents Algorithm 2 as an alternative that uses of subset of the trained models to estimate the upper bound in conditional expected improvement $\overline{\mu}$. However, the tightness of this algorithm is n
- Theoretical novelty: The framing of LDA search as an optimal stopping problem is original and mathematically sound. The derivation of anytime-valid upper bounds for marginal gains extends prior work in statistical inference and stopping theory. - Practical relevance: The work connects theoretical constructs to regulatory and compliance debates in algorithmic fairness, addressing a pressing question of how firms can demonstrate sufficient fairness efforts. - Methodological rigor: The paper cl
- Limited empirical scope: The empirical evaluation, though methodologically correct, uses small-scale settings with standard datasets. There is limited evidence of robustness in larger or more complex model retraining pipelines. - Assumption strength: Several theoretical results depend on distributional assumptions that may not hold in realistic ML training scenarios with non-iid retraining or adaptive hyperparameter tuning. - Connection to fairness metrics: While the framework generalizes be
Turning “good-faith LDA search” into an auditable optimal-stopping problem with an explicit threshold $\gamma$ is a nice formulation. The adaptive rule provides high-probability upper bounds on the marginal gain from one more retrain, enabling a certificate that a search was “sufficient” at the data-dependent stopping time. The paper motivates the problem clearly, is well-written, and easy to follow.
1. **Bounding $\mu$ in Section 3.2.** The paper presents several upper bounds on \( \mu(u) \) under different assumptions on the underlying density. It is not immediately clear how conservative these bounds are in practice. Could the authors comment on the **tightness** of these bounds (e.g., instances where they are known to be sharp vs. loose), and perhaps provide empirical or theoretical comparisons across the proposed choices? 2. **Online learning formulation (infinite-data regime).**
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Financial Distress and Bankruptcy Prediction · AI and HR Technologies
