An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio
Xinrui Yan, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Haoxin Ma, Tao, Wang, Shiming Wang, Ruibo Fu

TL;DR
This paper introduces a new approach to detect vocoder fingerprints in fake audio, aiming to identify the model behind the audio generation, with initial experiments showing promising feature distinctions.
Contribution
It proposes the novel problem of vocoder fingerprint detection in fake audio and explores initial features and models for this purpose.
Findings
Distinct vocoder fingerprints can be visualized using t-SNE.
Preliminary features and models show potential in differentiating vocoders.
Experiments conducted on multiple state-of-the-art vocoders.
Abstract
Many effective attempts have been made for fake audio detection. However, they can only provide detection results but no countermeasures to curb this harm. For many related practical applications, what model or algorithm generated the fake audio also is needed. Therefore, We propose a new problem for detecting vocoder fingerprints of fake audio. Experiments are conducted on the datasets synthesized by eight state-of-the-art vocoders. We have preliminarily explored the features and model architectures. The t-SNE visualization shows that different vocoders generate distinct vocoder fingerprints.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
