LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech

Xuechen Liu; Wanying Ge; Xin Wang; Junichi Yamagishi

arXiv:2507.16220·cs.SD·July 25, 2025

LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech

Xuechen Liu, Wanying Ge, Xin Wang, Junichi Yamagishi

PDF

Open Access

TL;DR

LENS-DF presents a comprehensive approach for training and evaluating audio deepfake detection and localization in realistic, noisy, and multi-speaker conditions, outperforming conventional methods.

Contribution

The paper introduces LENS-DF, a novel recipe for generating challenging audio data and a protocol for robust deepfake detection and localization in complex scenarios.

Findings

01

Models trained with LENS-DF data outperform conventional training methods.

02

LENS-DF improves robustness in noisy and multi-speaker audio conditions.

03

Ablation studies highlight the importance of data characteristics for detection performance.

Abstract

This study introduces LENS-DF, a novel and comprehensive recipe for training and evaluating audio deepfake detection and temporal localization under complicated and realistic audio conditions. The generation part of the recipe outputs audios from the input dataset with several critical characteristics, such as longer duration, noisy conditions, and containing multiple speakers, in a controllable fashion. The corresponding detection and localization protocol uses models. We conduct experiments based on self-supervised learning front-end and simple back-end. The results indicate that models trained using data generated with LENS-DF consistently outperform those trained via conventional recipes, demonstrating the effectiveness and usefulness of LENS-DF for robust audio deepfake detection and localization. We also conduct ablation studies on the variations introduced, investigating their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing