Low-frequency Compensated Synthetic Impulse Responses for Improved   Far-field Speech Recognition

Zhenyu Tang; Hsien-Yu Meng; Dinesh Manocha

arXiv:1910.10815·cs.SD·September 28, 2021

Low-frequency Compensated Synthetic Impulse Responses for Improved Far-field Speech Recognition

Zhenyu Tang, Hsien-Yu Meng, Dinesh Manocha

PDF

1 Repo

TL;DR

This paper introduces a method to generate low-frequency compensated synthetic impulse responses that enhance far-field speech recognition accuracy by reducing word-error-rate when augmenting training data.

Contribution

It presents a novel linear-phase filter design for creating more realistic synthetic impulse responses tailored to real-world conditions.

Findings

01

Reduces word-error-rate by up to 8.8% on LibriSpeech test set

02

Improves far-field speech recognition performance with synthetic data augmentation

03

Demonstrates effectiveness of low-frequency compensation in impulse response simulation

Abstract

We propose a method for generating low-frequency compensated synthetic impulse responses that improve the performance of far-field speech recognition systems trained on artificially augmented datasets. We design linear-phase filters that adapt the simulated impulse responses to equalization distributions corresponding to real-world captured impulse responses. Our filtered synthetic impulse responses are then used to augment clean speech data from LibriSpeech dataset [1]. We evaluate the performance of our method on the real-world LibriSpeech test set. In practice, our low-frequency compensated synthetic dataset can reduce the word-error-rate by up to 8.8% for far-field speech recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RoyJames/kaldi
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.