Deep speech inpainting of time-frequency masks

Mikolaj Kegler; Pierre Beckmann; Milos Cernak

arXiv:1910.09058·cs.SD·November 12, 2020

Deep speech inpainting of time-frequency masks

Mikolaj Kegler, Pierre Beckmann, Milos Cernak

PDF

2 Repos

TL;DR

This paper introduces an end-to-end speech inpainting framework using a convolutional U-Net trained with deep feature losses, effectively recovering missing or distorted speech segments and improving objective speech quality metrics.

Contribution

The novel framework employs deep feature losses from a pre-trained speechVGG model to enhance speech inpainting performance over traditional methods.

Findings

01

Successfully recovered up to 400 ms of missing speech segments.

02

Significant improvements in STOI and PESQ metrics.

03

Deep feature loss training outperformed conventional approaches.

Abstract

Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConcatenated Skip Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution · U-Net