Robust Front-End for Multi-Channel ASR using Flow-Based Density   Estimation

Hyeongju Kim; Hyeonseung Lee; Woo Hyun Kang; Hyung Yong Kim; Nam Soo; Kim

arXiv:2007.12903·cs.SD·July 28, 2020

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

Hyeongju Kim, Hyeonseung Lee, Woo Hyun Kang, Hyung Yong Kim, Nam Soo, Kim

PDF

TL;DR

This paper introduces a flow-based density estimation method for a robust multi-channel speech recognition front-end that improves performance without relying on parallel clean-noisy speech data.

Contribution

It proposes a novel non-parallel training approach using flow-based density estimation to enhance multi-channel ASR front-ends beyond traditional methods.

Findings

01

Outperforms conventional front-end training techniques on CHiME-4 dataset

02

Effective without requiring parallel clean and noisy speech data

03

Improves robustness of multi-channel ASR systems

Abstract

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.