Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Anh-Tuan Dao; Driss Matrouf; Nicholas Evans

arXiv:2602.20805·cs.SD·February 25, 2026

Assessing the Impact of Speaker Identity in Speech Spoofing Detection

Anh-Tuan Dao, Driss Matrouf, Nicholas Evans

PDF

Open Access

TL;DR

This paper investigates how speaker identity affects speech spoofing detection and proposes methods to either model or remove speaker information, significantly improving detection accuracy across multiple datasets.

Contribution

It introduces the Speaker-Invariant Multi-Task framework with two approaches to handle speaker information, enhancing spoofing detection performance.

Findings

01

Speaker-invariant model reduces EER by 17% on average.

02

Up to 48% EER reduction for challenging attacks.

03

Proposed methods outperform baseline models.

Abstract

Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Biometric Identification and Security · Speech and Audio Processing