Recognizing Overlapped Speech in Meetings: A Multichannel Separation   Approach Using Neural Networks

Takuya Yoshioka; Hakan Erdogan; Zhuo Chen; Xiong Xiao; Fil Alleva

arXiv:1810.03655·eess.AS·October 10, 2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva

PDF

TL;DR

This paper introduces an unmixing transducer based on neural networks for recognizing overlapped speech in meetings, significantly improving transcription accuracy over previous methods and enabling real meeting applications.

Contribution

It proposes a novel unmixing transducer with fixed output channels using BLSTM, allowing effective separation of overlapped speech in real meeting recordings.

Findings

01

Outperforms state-of-the-art neural mask-based beamformer by 10.8%

02

Significant improvements in overlapped speech segments

03

First application of overlapped speech recognition to real meetings

Abstract

The goal of this work is to develop a meeting transcription system that can recognize speech even when utterances of different speakers are overlapped. While speech overlaps have been regarded as a major obstacle in accurately transcribing meetings, a traditional beamformer with a single output has been exclusively used because previously proposed speech separation techniques have critical constraints for application to real meetings. This paper proposes a new signal processing module, called an unmixing transducer, and describes its implementation using a windowed BLSTM. The unmixing transducer has a fixed number, say J, of output channels, where J may be different from the number of meeting attendees, and transforms an input multi-channel acoustic signal into J time-synchronous audio streams. Each utterance in the meeting is separated and emitted from one of the output channels. Then,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.