Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Chengzhe Sun; Shan Jia; Shuwei Hou; Ehab AlBadawy; Siwei Lyu

arXiv:2302.09198·cs.SD·April 28, 2023

Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts

Chengzhe Sun, Shan Jia, Shuwei Hou, Ehab AlBadawy, Siwei Lyu

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel detection method for AI-synthesized human voices by identifying neural vocoder artifacts using a multi-task learning framework, significantly improving classification accuracy.

Contribution

It introduces a multi-task learning approach that leverages vocoder artifact detection to enhance synthetic voice detection performance.

Findings

01

High classification accuracy achieved with the proposed model.

02

Vocoder artifact detection improves synthetic voice discrimination.

03

Multi-task learning constrains feature extraction for better results.

Abstract

The advancements of AI-synthesized human voices have introduced a growing threat of impersonation and disinformation. It is therefore of practical importance to developdetection methods for synthetic human voices. This work proposes a new approach to detect synthetic human voices based on identifying artifacts of neural vocoders in audio signals. A neural vocoder is a specially designed neural network that synthesizes waveforms from temporal-frequency representations, e.g., mel-spectrograms. The neural vocoder is a core component in most DeepFake audio synthesis models. Hence the identification of neural vocoder processing implies that an audio sample may have been synthesized. To take advantage of the vocoder artifacts for synthetic human voice detection, we introduce a multi-task learning framework for a binary-class RawNet2 model that shares the front-end feature extractor with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csun22/librivoc-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing