A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network   Speech Enhancement

Bengt J. Borgstrom; Michael S. Brandstein

arXiv:2309.12121·eess.AS·September 22, 2023

A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement

Bengt J. Borgstrom, Michael S. Brandstein

PDF

Open Access

TL;DR

This paper introduces a multiscale autoencoder framework for end-to-end neural speech enhancement, leveraging spectral decomposition across multiple scales to improve performance over traditional methods.

Contribution

It presents a novel multiscale autoencoder architecture with flexible spectral band design, fully differentiable components, and demonstrated superiority over existing systems.

Findings

01

Outperforms conventional single-branch autoencoders.

02

Achieves better speech quality metrics.

03

Improves automatic speech recognition accuracy.

Abstract

Neural network approaches to single-channel speech enhancement have received much recent attention. In particular, mask-based architectures have achieved significant performance improvements over conventional methods. This paper proposes a multiscale autoencoder (MSAE) for mask-based end-to-end neural network speech enhancement. The MSAE performs spectral decomposition of an input waveform within separate band-limited branches, each operating with a different rate and scale, to extract a sequence of multiscale embeddings. The proposed framework features intuitive parameterization of the autoencoder, including a flexible spectral band design based on the Constant-Q transform. Additionally, the MSAE is constructed entirely of differentiable operators, allowing it to be implemented within an end-to-end neural network, and be discriminatively trained. The MSAE draws motivation both from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Speech Recognition and Synthesis