FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
Hasam Khalid, Shahroz Tariq, Minha Kim, Simon S. Woo

TL;DR
FakeAVCeleb is a new high-quality multimodal deepfake dataset containing both audio and video, designed to improve detection of synthetic media and address racial bias in existing datasets.
Contribution
We introduce FakeAVCeleb, a comprehensive audio-video deepfake dataset with diverse ethnic backgrounds, enabling better development of multimodal deepfake detectors.
Findings
State-of-the-art detectors face challenges with the dataset.
The dataset reveals biases in current deepfake detection methods.
Multimodal detection improves robustness against deepfakes.
Abstract
While the significant advancements have made in the generation of deepfakes using deep learning technologies, its misuse is a well-known issue now. Deepfakes can cause severe security and privacy issues as they can be used to impersonate a person's identity in a video by replacing his/her face with another person's face. Recently, a new problem of generating synthesized human voice of a person is emerging, where AI-based deep learning models can synthesize any person's voice requiring just a few seconds of audio. With the emerging threat of impersonation attacks using deepfake audios and videos, a new generation of deepfake detectors is needed to focus on both video and audio collectively. To develop a competent deepfake detector, a large amount of high-quality data is typically required to capture real-world (or practical) scenarios. Existing deepfake datasets either contain deepfake…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
