As Biased as You Measure: Methodological Pitfalls of Bias Evaluations in Speaker Verification Research
Wiebke Hutiri, Tanvina Patel, Aaron Yi Ding, Odette Scharenborg

TL;DR
This paper investigates how measurement choices affect bias evaluation outcomes in speaker verification, revealing that metric selection and aggregation methods significantly influence results and proposing best practices for more consistent bias assessment.
Contribution
It empirically demonstrates the impact of measurement choices on bias evaluation outcomes and recommends the use of ratio-based bias measures for more reliable comparisons.
Findings
Bias evaluations are heavily influenced by base metrics and aggregation methods.
Ratio-based bias measures are recommended for small or diverse base metrics.
Contradictory conclusions across studies are partly due to methodological differences.
Abstract
Detecting and mitigating bias in speaker verification systems is important, as datasets, processing choices and algorithms can lead to performance differences that systematically favour some groups of people while disadvantaging others. Prior studies have thus measured performance differences across groups to evaluate bias. However, when comparing results across studies, it becomes apparent that they draw contradictory conclusions, hindering progress in this area. In this paper we investigate how measurement impacts the outcomes of bias evaluations. We show empirically that bias evaluations are strongly influenced by base metrics that measure performance, by the choice of ratio or difference-based bias measure, and by the aggregation of bias measures into meta-measures. Based on our findings, we recommend the use of ratio-based bias measures, in particular when the values of base…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsBalanced Selection
