Toward Sufficient Statistical Power in Algorithmic Bias Assessment: A Test for ABROCA
Conrad Borchers

TL;DR
This paper examines the ABROCA metric for fairness in educational data mining, revealing its distributional challenges and proposing nonparametric tests to improve bias detection reliability in typical sample sizes.
Contribution
It introduces robust significance testing methods for ABROCA, addressing distributional issues and power limitations in bias assessment within EDM.
Findings
ABROCA does not follow standard distributions, including skewed ones.
Nonparametric randomization tests improve bias detection reliability.
Large sample sizes or effect sizes are needed for effective bias detection.
Abstract
Algorithmic bias is a pressing concern in educational data mining (EDM), as it risks amplifying inequities in learning outcomes. The Area Between ROC Curves (ABROCA) metric is frequently used to measure discrepancies in model performance across demographic groups to quantify overall model fairness. However, its skewed distribution--especially when class or group imbalances exist--makes significance testing challenging. This study investigates ABROCA's distributional properties and contributes robust methods for its significance testing. Specifically, we address (1) whether ABROCA follows any known distribution, (2) how to reliably test for algorithmic bias using ABROCA, and (3) the statistical power achievable with ABROCA-based bias assessments under typical EDM sample specifications. Simulation results confirm that ABROCA does not match standard distributions, including those suited to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
