Utilizing Speaker Profiles for Impersonation Audio Detection
Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, and Xinrui Yan, Yujie Chen, Xiaohui Zhang

TL;DR
This paper introduces a novel approach for detecting impersonation audio by leveraging speaker profiles, and presents a large-scale Chinese dataset to facilitate research in this underexplored area.
Contribution
It proposes integrating speaker profiles into impersonation detection and provides the first large-scale Chinese dataset for impersonation audio research.
Findings
Speaker profiles improve detection accuracy.
Existing methods struggle with impersonation detection.
The dataset reveals the challenges in impersonation audio detection.
Abstract
Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike TTS and VC, which often leave digital traces or signal artifacts, impersonation involves live human beings producing entirely natural speech, rendering the detection of impersonation audio a challenging task. Thus, we propose a novel method that integrates speaker profiles into the process of impersonation audio detection. Speaker profiles are inherent characteristics that are challenging for impersonators to mimic accurately, such as speaker's age, job. We aim to leverage these features to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
