An Exploration of Mimic Architectures for Residual Network Based   Spectral Mapping

Peter Plantinga; Deblin Bagchi; Eric Fosler-Lussier

arXiv:1809.09756·cs.SD·September 27, 2018·1 cites

An Exploration of Mimic Architectures for Residual Network Based Spectral Mapping

Peter Plantinga, Deblin Bagchi, Eric Fosler-Lussier

PDF

Open Access 1 Repo

TL;DR

This paper investigates the use of residual networks and long-term context integration in spectral mapping to improve speech enhancement, achieving state-of-the-art results in speech recognition accuracy.

Contribution

It introduces residual network architectures and wide-residual biLSTM models for spectral mapping, enhancing speech cleaning performance over traditional DNN approaches.

Findings

01

Residual networks outperform DNNs in spectral mapping.

02

Long-term context integration improves speech enhancement.

03

Achieved lowest WER of 9.3% on CHiME-2 dataset.

Abstract

Spectral mapping uses a deep neural network (DNN) to map directly from noisy speech to clean speech. Our previous study found that the performance of spectral mapping improves greatly when using helpful cues from an acoustic model trained on clean speech. The mapper network learns to mimic the input favored by the spectral classifier and cleans the features accordingly. In this study, we explore two new innovations: we replace a DNN-based spectral mapper with a residual network that is more attuned to the goal of predicting clean speech. We also examine how integrating long term context in the mimic criterion (via wide-residual biLSTM networks) affects the performance of spectral mapping compared to DNNs. Our goal is to derive a model that can be used as a preprocessor for any recognition system; the features derived from our model are passed through the standard Kaldi ASR pipeline and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OSU-slatelab/residual_mimic_net
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing