Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets
Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess

TL;DR
This paper introduces SigPointer, a pointer network-based method for detecting audio splicing in voice recordings, demonstrating improved accuracy especially in challenging forensic scenarios with compressed and noisy signals.
Contribution
The paper presents a novel pointer network framework tailored for continuous audio input, enhancing detection of splice locations over existing methods in practical forensic conditions.
Findings
Performance improvements of 6-10 percentage points over previous methods.
Effective detection in strongly compressed and noisy audio signals.
Demonstrated robustness in challenging forensic scenarios.
Abstract
Verifying the integrity of voice recording evidence for criminal investigations is an integral part of an audio forensic analyst's work. Here, one focus is on detecting deletion or insertion operations, so called audio splicing. While this is a rather easy approach to alter spoken statements, careful editing can yield quite convincing results. For difficult cases or big amounts of data, automated tools can support in detecting potential editing locations. To this end, several analytical and deep learning methods have been proposed by now. Still, few address unconstrained splicing scenarios as expected in practice. With SigPointer, we propose a pointer network framework for continuous input that uncovers splice locations naturally and more efficiently than existing works. Extensive experiments on forensically challenging data like strongly compressed and noisy signals quantify the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection · Speech Recognition and Synthesis
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · [LivE@PeRson]How do I talk to a real person at Expedia? · Focus · Softmax · Pointer Network
