End-Point Detection with State Transition Model based on Chunk-Wise   Classification

Juntae Kim; Jaesung Bae; Minsoo Hahn

arXiv:1912.10442·eess.AS·December 24, 2019

End-Point Detection with State Transition Model based on Chunk-Wise Classification

Juntae Kim, Jaesung Bae, Minsoo Hahn

PDF

Open Access

TL;DR

This paper introduces a robust end-point detection method using a chunk-wise classification-based state transition model that reduces errors caused by noisy environments, improving speech/non-speech detection accuracy.

Contribution

It proposes a novel chunk-wise classification approach for state transition modeling in end-point detection, enhancing robustness against VAD errors in noisy conditions.

Findings

01

Improved accuracy in noisy environments.

02

Reduced false transitions due to chunk-wise aggregation.

03

Lower phone error rate in evaluations.

Abstract

A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques