Three-class Overlapped Speech Detection using a Convolutional Recurrent   Neural Network

Jee-weon Jung; Hee-Soo Heo; Youngki Kwon; Joon Son Chung; Bong-Jin Lee

arXiv:2104.02878·eess.AS·April 8, 2021

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

Jee-weon Jung, Hee-Soo Heo, Youngki Kwon, Joon Son Chung, Bong-Jin Lee

PDF

TL;DR

This paper introduces a three-class overlapped speech detection system using a convolutional recurrent neural network, achieving state-of-the-art performance and improving speaker diarization accuracy.

Contribution

The work presents a novel three-class classification approach for overlapped speech detection and demonstrates its effectiveness with a convolutional recurrent neural network architecture.

Findings

01

State-of-the-art precision of 0.6648 on DIHARD II

02

Recall improved by 20% over previous methods

03

Third place in DIHARD III speaker diarization challenge

Abstract

In this work, we propose an overlapped speech detection system trained as a three-class classifier. Unlike conventional systems that perform binary classification as to whether or not a frame contains overlapped speech, the proposed approach classifies into three classes: non-speech, single speaker speech, and overlapped speech. By training a network with the more detailed label definition, the model can learn a better notion on deciding the number of speakers included in a given frame. A convolutional recurrent neural network architecture is explored to benefit from both convolutional layer's capability to model local patterns and recurrent layer's ability to model sequential information. The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of 0.3222 on the DIHARD II evaluation set, showing a 20% increase in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.