Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics

Md Sahidullah; Hye-jin Shim; Rosa Gonzalez Hautam\"aki; Tomi H. Kinnunen

arXiv:2601.17782·cs.LG·January 27, 2026

Shortcut Learning in Binary Classifier Black Boxes: Applications to Voice Anti-Spoofing and Biometrics

Md Sahidullah, Hye-jin Shim, Rosa Gonzalez Hautam\"aki, Tomi H. Kinnunen

PDF

Open Access

TL;DR

This paper investigates shortcut learning in binary classifiers, especially in voice anti-spoofing and biometrics, proposing a new framework to analyze biases and their effects on classifier behavior.

Contribution

It introduces a novel framework combining intervention and observational methods with linear mixed-effects models to analyze biases in black-box classifiers.

Findings

01

Effective analysis of dataset biases in voice anti-spoofing and biometrics

02

Insights into how biases influence classifier decisions beyond error rates

03

Framework applicable to other domains for bias detection

Abstract

The widespread adoption of deep-learning models in data-driven applications has drawn attention to the potential risks associated with biased datasets and models. Neglected or hidden biases within datasets and models can lead to unexpected results. This study addresses the challenges of dataset bias and explores ``shortcut learning'' or ``Clever Hans effect'' in binary classifiers. We propose a novel framework for analyzing the black-box classifiers and for examining the impact of both training and test data on classifier scores. Our framework incorporates intervention and observational perspectives, employing a linear mixed-effects model for post-hoc analysis. By evaluating classifier performance beyond error rates, we aim to provide insights into biased datasets and offer a comprehensive understanding of their influence on classifier behavior. The effectiveness of our approach is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning