Detection of Glottal Closure Instants from Raw Speech using   Convolutional Neural Networks

Mohit Goyal; Varun Srivastava; Prathosh A. P

arXiv:1804.10147·cs.SD·July 11, 2019·1 cites

Detection of Glottal Closure Instants from Raw Speech using Convolutional Neural Networks

Mohit Goyal, Varun Srivastava, Prathosh A. P

PDF

Open Access 1 Repo

TL;DR

This paper introduces a deep convolutional neural network approach for detecting Glottal Closure Instants directly from raw speech signals, eliminating the need for signal transformation and heuristic algorithms, and demonstrating improved robustness in noisy conditions.

Contribution

The paper presents a novel end-to-end deep learning method for GCI detection that learns representations directly from raw speech, outperforming traditional two-stage methods especially in noisy environments.

Findings

01

Comparable to state-of-the-art methods in clean speech

02

Outperforms existing methods in non-stationary noise conditions

03

Demonstrates the effectiveness of representation learning for GCI detection

Abstract

Glottal Closure Instants (GCIs) correspond to the temporal locations of significant excitation to the vocal tract occurring during the production of voiced speech. GCI detection from speech signals is a well-studied problem given its importance in speech processing. Most of the existing approaches for GCI detection adopt a two-stage approach (i) Transformation of speech signal into a representative signal where GCIs are localized better, (ii) extraction of GCIs using the representative signal obtained in first stage. The former stage is accomplished using signal processing techniques based on the principles of speech production and the latter with heuristic-algorithms such as dynamic-programming and peak-picking. These methods are thus task-specific and rely on the methods used for representative signal extraction. However, in this paper, we formulate the GCI detection problem from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VarunSrivastavaIITD/DCNN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing