Betray Oneself: A Novel Audio DeepFake Detection Model via   Mono-to-Stereo Conversion

Rui Liu; Jinhua Zhang; Guanglai Gao; Haizhou Li

arXiv:2305.16353·cs.SD·May 29, 2023·1 cites

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces M2S-ADD, a novel audio DeepFake detection model that leverages mono-to-stereo conversion and dual-channel analysis to improve detection accuracy.

Contribution

It proposes a new approach using stereo cues via mono-to-stereo conversion and dual-branch neural networks for enhanced audio DeepFake detection.

Findings

01

Outperforms mono-input baselines on ASVspoof2019 dataset

02

Effectively reveals artifacts in fake audio signals

03

Utilizes stereo information for improved detection accuracy

Abstract

Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has not been studied in the prior work. In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process. We first projects the mono to a stereo signal using a pretrained stereo synthesizer, then employs a dual-branch neural architecture to process the left and right channel signals, respectively. In this way, we effectively reveal the artifacts in the fake audio, thus improve the ADD performance. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-s2-lab/m2s-add
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Music and Audio Processing · Speech and Audio Processing

MethodsFocus