Monaural Multi-Talker Speech Recognition using Factorial Speech   Processing Models

Mahdi Khademian; Mohammad Mehdi Homayounpour

arXiv:1610.01367·cs.CL·October 6, 2016·1 cites

Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models

Mahdi Khademian, Mohammad Mehdi Homayounpour

PDF

Open Access

TL;DR

This paper introduces a factorial speech processing model with joint decoding and neural network enhancements for monaural multi-talker speech recognition, surpassing previous super-human performance levels.

Contribution

It develops a novel joint-token passing algorithm for simultaneous decoding of target and masker speakers, improving over traditional two-phase methods.

Findings

01

Outperforms previous super-human speech recognition systems.

02

Achieves 5.5% absolute performance improvement over initial super-human models.

03

Attains 2.7% absolute improvement over recent deep learning-based competitors.

Abstract

A Pascal challenge entitled monaural multi-talker speech recognition was developed, targeting the problem of robust automatic speech recognition against speech like noises which significantly degrades the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly during the challenge, a team from IBM research, could achieve a performance better than human listeners on this task. The proposed method of the IBM team, consist of an intermediate speech separation and then a single-talker speech recognition. This paper reconsiders the task of this challenge based on gain adapted factorial speech processing models. It develops a joint-token passing algorithm for direct utterance decoding of both target and masker speakers, simultaneously.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques