Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals
Marvin Coto-Jimenez

TL;DR
This paper introduces a supervised initialization method for LSTM networks using an auto-associative network to improve fundamental frequency detection in noisy speech signals, enhancing accuracy and training efficiency.
Contribution
It presents a novel supervised initialization approach for LSTM networks, improving fundamental frequency detection in noisy speech over traditional random initialization.
Findings
Supervised initialization improves detection accuracy.
Enhanced training efficiency under noisy conditions.
Better performance across different noise levels.
Abstract
Fundamental frequency is one of the most important parameters of human speech, of importance for the classification of accent, gender, speaking styles, speaker identification, age, among others. The proper detection of this parameter remains as an important challenge for severely degraded signals. In previous references for detecting fundamental frequency in noisy speech using deep learning, the networks, such as Long Short-term Memory (LSTM) has been initialized with random weights, and then trained following a back-propagation through time algorithm. In this work, a proposal for a more efficient initialization, based on a supervised training using an Auto-associative network, is presented. This initialization is a better starting point for the detection of fundamental frequency in noisy speech. The advantages of this initialization are noticeable using objective measures for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
