Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers
Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

TL;DR
This paper introduces an ADMM-based optimization approach for mel-spectrogram inversion, improving the joint estimation of magnitude and phase to enhance signal reconstruction quality in speech and sound synthesis.
Contribution
The paper proposes a novel ADMM-based joint estimation method for mel-spectrogram inversion that outperforms existing iterative approaches in efficiency and accuracy.
Findings
Effective reconstruction of speech and sounds demonstrated.
Outperforms previous methods in accuracy and convergence speed.
Joint estimation reduces error accumulation.
Abstract
Signal reconstruction from its mel-spectrogram is known as mel-spectrogram inversion and has many applications, including speech and foley sound synthesis. In this paper, we propose a mel-spectrogram inversion method based on a rigorous optimization algorithm. To reconstruct a time-domain signal with inverse short-time Fourier transform (STFT), both full-band STFT magnitude and phase should be predicted from a given mel-spectrogram. Their joint estimation has outperformed the cascaded full-band magnitude prediction and phase reconstruction by preventing error accumulation. However, the existing joint estimation method requires many iterations, and there remains room for performance improvement. We present an alternating direction method of multipliers (ADMM)-based joint estimation method motivated by its success in various nonconvex optimization problems including phase reconstruction.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotonic and Optical Devices · Neural Networks and Applications · Semiconductor Lasers and Optical Devices
