Precise Detection of Speech Endpoints Dynamically: A Wavelet Convolution based approach
Tanmoy Roy, Tshilidzi Marwala, Snehashish Chakraverty

TL;DR
This paper introduces WCSEPD, a wavelet convolution-based algorithm that accurately detects speech endpoints in noisy conditions without requiring labeled training data, improving over traditional energy and zero-crossing methods.
Contribution
The paper presents a novel wavelet convolution approach for speech endpoint detection that effectively handles non-speech artifacts without the need for labeled training data.
Findings
Accurately detects speech endpoints amidst non-speech artifacts
Does not require labeled training data
Outperforms traditional energy-based methods
Abstract
Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Algorithms based on pattern recognition and classification techniques are also proposed but require labeled data for training. A new algorithm termed as Wavelet Convolution based Speech Endpoint Detection (WCSEPD) is proposed in this article to extract speech endpoints. WCSEPD decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and computes entropy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
