Spatio-Temporal Representation Learning Enhanced Source Cell-phone   Recognition from Speech Recordings

Chunyan Zeng; Shixiong Feng; Zhifeng Wang; Xiangkui Wan; Yunfan Chen,; Nan Zhao

arXiv:2208.12753·cs.SD·August 29, 2022

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

Chunyan Zeng, Shixiong Feng, Zhifeng Wang, Xiangkui Wan, Yunfan Chen,, Nan Zhao

PDF

Open Access

TL;DR

This paper introduces a spatio-temporal representation learning approach for source cell-phone recognition from speech recordings, significantly improving accuracy by capturing long-term device features.

Contribution

It proposes a novel method combining Gaussian mean matrix features with a C3D-BiLSTM network for enhanced recognition accuracy.

Findings

01

Achieves 99.03% accuracy on CCNU_Mobile dataset

02

Outperforms existing state-of-the-art methods

03

Effective in small sample scenarios

Abstract

The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extraction of sequential Gaussian mean matrix features and construction of a recognition model based on spatio-temporal representation learning. In the feature extraction part, based on the analysis of time-series representation of recording source signals, we extract sequential Gaussian mean matrix with long-term and short-term representation ability by using the sensitivity of Gaussian mixture model to data distribution. In the model construction part, we design a structured spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis

MethodsMemory Network