Fast Spectrogram Inversion using Multi-head Convolutional Neural   Networks

Sercan O. Arik; Heewoo Jun; and Gregory Diamos

arXiv:1808.06719·cs.SD·December 26, 2018

Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks

Sercan O. Arik, Heewoo Jun, and Gregory Diamos

PDF

TL;DR

This paper introduces MCNN, a multi-head convolutional neural network architecture that enables extremely fast and high-quality speech waveform synthesis directly from spectrograms, outperforming traditional iterative methods in speed and efficiency.

Contribution

The paper presents a novel multi-head CNN architecture for spectrogram inversion that achieves over 300x real-time synthesis without iterative algorithms, improving efficiency and quality.

Findings

01

MCNN achieves over 300x real-time waveform synthesis.

02

It outperforms iterative algorithms like Griffin-Lim in efficiency.

03

The approach produces high-quality speech without autoregression.

Abstract

We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTransposed convolution · Convolution