Time-Domain Based Embeddings for Spoofed Audio Representation
Matan Karo, Arie Yeredor, Itshak Lapidot

TL;DR
This paper introduces a novel time-domain embedding method for anti-spoofing in speech, utilizing PMF estimation and diffusion maps to improve spoof detection and visualization.
Contribution
It proposes a new time-domain feature extraction approach based on PMF distances, combined with diffusion maps for better spoofing attack analysis.
Findings
PMF-based features improve spoof detection accuracy
Diffusion maps reveal underlying data manifolds
Visualization aids in understanding spoofing techniques
Abstract
Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filter-bank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsDiffusion
