TL;DR
This paper introduces a convolutional neural network-based method for detecting Glottal Closure Instants directly from speech waveforms, trained on synthetic data to improve accuracy and generalization over existing approaches.
Contribution
It proposes a novel CNN approach trained on synthetic speech for GCI detection, outperforming or matching state-of-the-art methods and enhancing generalization.
Findings
Synthetic training data improves GCI detection accuracy.
The proposed method performs comparably or better than existing algorithms.
Large synthetic datasets enhance generalization across speakers.
Abstract
Glottal Closure Instants (GCI) detection consists in automatically detecting temporal locations of most significant excitation of the vocal tract from the speech signal. It is used in many speech analysis and processing applications, and various algorithms have been proposed for this purpose. Recently, new approaches using convolutional neural networks have emerged, with encouraging results. Following this trend, we propose a simple approach that performs a mapping from the speech waveform to a target signal from which the GCIs are obtained by peak-picking. However, the ground truth GCIs used for training and evaluation are usually extracted from EGG signals, which are not perfectly reliable and often not available. To overcome this problem, we propose to train our network on high-quality synthetic speech with perfect ground truth. The performances of the proposed algorithm are compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
