RTCFake: Speech Deepfake Detection in Real-Time Communication

Jun Xue; Zhuolin Yi; Yihuan Huang; Yanzhen Ren; Yujie Chen; Cunhang Fan; Zicheng Su; Yonghong Zhang; Bo Cai

arXiv:2604.23742·cs.SD·April 28, 2026

RTCFake: Speech Deepfake Detection in Real-Time Communication

Jun Xue, Zhuolin Yi, Yihuan Huang, Yanzhen Ren, Yujie Chen, Cunhang Fan, Zicheng Su, Yonghong Zhang, Bo Cai

PDF

1 Repo

TL;DR

This paper introduces RTCFake, a large-scale speech deepfake dataset for real-time communication, and proposes a phoneme-guided consistency learning strategy to improve detection robustness across platforms and noise conditions.

Contribution

It presents the first RTC-specific speech deepfake dataset and a novel PCL method that enhances cross-platform and noise-robust detection performance.

Findings

01

RTCFake dataset contains approximately 600 hours of data from multiple platforms.

02

PCL strategy significantly improves cross-platform generalization.

03

The approach enhances robustness against complex noise and unknown speech enhancement processes.

Abstract

With the rapid advancement of speech generation technologies, the threat posed by speech deepfakes in real-time communication (RTC) scenarios has intensified. However, existing detection studies mainly focus on offline simulations and struggle to cope with the complex distortions introduced during RTC transmission, including unknown speech enhancement processes (e.g., noise suppression) and codec compression. To address this challenge, we present the first large-scale speech deepfake dataset tailored for RTC scenarios, termed \textit{RTCFake}, totaling approximately 600 hours. The dataset is constructed by transmitting speech through multiple mainstream social media and conferencing platforms (e.g., Zoom), enabling precise pairing between offline and online speech. In addition, we propose a phoneme-guided consistency learning (PCL) strategy that enforces models to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/JunXueTech/RTCFake
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.