Towards Advanced Speech Signal Processing: A Statistical Perspective on   Convolution-Based Architectures and its Applications

Nirmal Joshua Kapu; Raghav Karan

arXiv:2411.18636·cs.SD·December 2, 2024

Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications

Nirmal Joshua Kapu, Raghav Karan

PDF

TL;DR

This paper surveys convolution-based speech processing models, analyzing their statistical foundations, applications, and performance, to guide future research and improve speech technology systems.

Contribution

It provides a comprehensive statistical perspective on convolutional models and compares their performance across various speech processing tasks.

Findings

01

Convolutional models vary in accuracy, speed, and model size.

02

Statistical analysis highlights strengths and weaknesses of each model.

03

The survey identifies potential errors and future research directions.

Abstract

This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model size, accuracy and speed assessment, we compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasizing the central role it plays in advancing applications of speech technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings