Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency

Jana Shokr; Minos Papadopoulos; Jeremy Cooperstock; Pavo Orepic

arXiv:2605.08165·eess.AS·May 12, 2026

Low-Cost Detection of Degraded Voice Clones via Source-Output Acoustic Consistency

Jana Shokr, Minos Papadopoulos, Jeremy Cooperstock, Pavo Orepic

PDF

TL;DR

This paper demonstrates that simple, interpretable acoustic features like fundamental frequency and Harmonics-to-Noise Ratio can effectively detect degraded synthetic voices, aiding quick rejection in sensitive applications.

Contribution

It introduces a lightweight, threshold-based detection method using source-output acoustic features for identifying failed voice synthesis outputs.

Findings

01

f0 and HNR achieved over 85% accuracy for WaveRNN

02

HNR outperformed other features for HiFi-GAN detection

03

source-output features capture distinct failure patterns

Abstract

Recent advances in generative speech have increased the need for automatic detection of obviously failed synthetic outputs. This is particularly important in clinical settings such as AVATAR therapy, in which schizophrenia patients engage with a computer-generated representation of their hallucinated voices and degraded synthesis may disrupt immersion and therapeutic engagement. We investigate whether low-dimensional, interpretable source-output acoustic features can provide a lightweight first-pass detector of degraded voice-cloning outputs. Motivated by source-filter models of speech, we first test median fundamental frequency (f0) as a source-related consistency measure, and compare it with vocal tract length (VTL) as a filter-related measure and Harmonics-to-Noise Ratio (HNR) as a noise-related descriptor. Human-labeled voice-cloning samples generated with two vocoder families,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.