Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks

Prajwal Chinchmalatpure; Suyash Chinchmalatpure; Siddharth Chavan

arXiv:2601.04227·cs.SD·January 9, 2026

Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks

Prajwal Chinchmalatpure, Suyash Chinchmalatpure, Siddharth Chavan

PDF

Open Access

TL;DR

This paper presents a real-time detection system for AI-generated voice conversion attacks, using acoustic features and machine learning to distinguish authentic speech from RVC-generated deepfakes in streaming audio.

Contribution

It introduces a low-latency, segment-based detection method for RVC voice conversion, emphasizing realistic audio conditions and practical deployment considerations.

Findings

01

Short-window acoustic features effectively distinguish RVC speech.

02

Detection system performs reliably even in noisy environments.

03

Supports both segment-level and call-level classification for real-time use.

Abstract

Generative audio technologies now enable highly realistic voice cloning and real-time voice conversion, increasing the risk of impersonation, fraud, and misinformation in communication channels such as phone and video calls. This study investigates real-time detection of AI-generated speech produced using Retrieval-based Voice Conversion (RVC), evaluated on the DEEP-VOICE dataset, which includes authentic and voice-converted speech samples from multiple well-known speakers. To simulate realistic conditions, deepfake generation is applied to isolated vocal components, followed by the reintroduction of background ambiance to suppress trivial artifacts and emphasize conversion-specific cues. We frame detection as a streaming classification task by dividing audio into one-second segments, extracting time-frequency and cepstral features, and training supervised machine learning models to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing