Supervised Initialization of LSTM Networks for Fundamental Frequency   Detection in Noisy Speech Signals

Marvin Coto-Jimenez

arXiv:1911.04580·cs.SD·November 13, 2019

Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals

Marvin Coto-Jimenez

PDF

Open Access

TL;DR

This paper introduces a supervised initialization method for LSTM networks using an auto-associative network to improve fundamental frequency detection in noisy speech signals, enhancing accuracy and training efficiency.

Contribution

It presents a novel supervised initialization approach for LSTM networks, improving fundamental frequency detection in noisy speech over traditional random initialization.

Findings

01

Supervised initialization improves detection accuracy.

02

Enhanced training efficiency under noisy conditions.

03

Better performance across different noise levels.

Abstract

Fundamental frequency is one of the most important parameters of human speech, of importance for the classification of accent, gender, speaking styles, speaker identification, age, among others. The proper detection of this parameter remains as an important challenge for severely degraded signals. In previous references for detecting fundamental frequency in noisy speech using deep learning, the networks, such as Long Short-term Memory (LSTM) has been initialized with random weights, and then trained following a back-propagation through time algorithm. In this work, a proposal for a more efficient initialization, based on a supervised training using an Auto-associative network, is presented. This initialization is a better starting point for the detection of fundamental frequency in noisy speech. The advantages of this initialization are noticeable using objective measures for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing