Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Daniel Haider; Felix Perfler; Vincent Lostanlen; Martin Ehler; and Peter Balazs

arXiv:2408.17358·cs.SD·September 2, 2024

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Daniel Haider, Felix Perfler, Vincent Lostanlen, Martin Ehler, and Peter Balazs

PDF

Open Access 2 Repos

TL;DR

This paper introduces a stable encoder-decoder architecture for speech enhancement that combines auditory filterbanks, frame theory, and spectral norms to improve training stability and speech quality.

Contribution

It proposes a hybrid approach integrating theory-driven and data-driven methods for training stable 1-D convolutional encoders in speech enhancement.

Findings

01

Significant improvement in PESQ scores.

02

Enhanced stability in training 1-D convolutional encoders.

03

Effective integration of auditory filterbanks and frame theory.

Abstract

Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and data-driven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques