Half-Truth: A Partially Fake Audio Detection Dataset

Jiangyan Yi; Ye Bai; Jianhua Tao; Haoxin Ma; Zhengkun Tian; Chenglong; Wang; Tao Wang; Ruibo Fu

arXiv:2104.03617·cs.SD·December 19, 2023·1 cites

Half-Truth: A Partially Fake Audio Detection Dataset

Jiangyan Yi, Ye Bai, Jianhua Tao, Haoxin Ma, Zhengkun Tian, Chenglong, Wang, Tao Wang, Ruibo Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Half-Truth (HAD) dataset, featuring partially fake audio with manipulated words, to improve detection and localization of subtle audio manipulations, highlighting challenges in current fake audio detection methods.

Contribution

The paper creates a novel dataset for half-truth fake audio detection, enabling research on detecting and localizing small manipulated regions in speech.

Findings

01

Partially fake audio is more challenging to detect than fully fake audio.

02

State-of-the-art speech synthesis can generate convincing manipulated words.

03

Benchmark results demonstrate the difficulty of detecting partial fakes.

Abstract

Diverse promising datasets have been designed to hold back the development of fake audio detection, such as ASVspoof databases. However, previous datasets ignore an attacking situation, in which the hacker hides some small fake clips in real speech audio. This poses a serious threat since that it is difficult to distinguish the small fake clip from the whole speech utterance. Therefore, this paper develops such a dataset for half-truth audio detection (HAD). Partially fake audio in the HAD dataset involves only changing a few words in an utterance.The audio of the words is generated with the very latest state-of-the-art speech synthesis technology. We can not only detect fake uttrances but also localize manipulated regions in a speech using this dataset. Some benchmark results are presented on this dataset. The results show that partially fake audio presents much more challenging than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zenodo.org/record/10377492
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis