Time-Domain Based Embeddings for Spoofed Audio Representation

Matan Karo; Arie Yeredor; Itshak Lapidot

arXiv:2210.15428·eess.AS·October 28, 2022

Time-Domain Based Embeddings for Spoofed Audio Representation

Matan Karo, Arie Yeredor, Itshak Lapidot

PDF

Open Access

TL;DR

This paper introduces a novel time-domain embedding method for anti-spoofing in speech, utilizing PMF estimation and diffusion maps to improve spoof detection and visualization.

Contribution

It proposes a new time-domain feature extraction approach based on PMF distances, combined with diffusion maps for better spoofing attack analysis.

Findings

01

PMF-based features improve spoof detection accuracy

02

Diffusion maps reveal underlying data manifolds

03

Visualization aids in understanding spoofing techniques

Abstract

Anti-spoofing is the task of speech authentication. That is, identifying genuine human speech compared to spoofed speech. The main focus of this paper is to suggest new representations for genuine and spoofed speech, based on the probability mass function (PMF) estimation of the audio waveforms' amplitude. We introduce a new feature extraction method for speech audio signals: unlike traditional methods, our method is based on direct processing of time-domain audio samples. The PMF is utilized by designing a feature extractor based on different PMF distances and similarity measures. As an additional step, we used filter-bank preprocessing, which significantly affects the discriminative characteristics of the features and facilitates convenient visualization of possible clustering of spoofing attacks. Furthermore, we use diffusion maps to reveal the underlying manifold on which the data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing

MethodsDiffusion