Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase
Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

TL;DR
This paper introduces an optimization-based approach for reconstructing time-domain signals from mel-spectrograms by jointly estimating full-band magnitude and phase, leveraging bi-level relationships among signal representations.
Contribution
It presents a novel optimization framework that jointly reconstructs magnitude and phase from mel-spectrograms, extending phase reconstruction methods beyond traditional STFT-based approaches.
Findings
Effective reconstruction demonstrated on speech, music, and environmental signals.
Outperforms traditional phase reconstruction algorithms like Griffin-Lim.
Joint estimation improves signal quality and consistency across diverse audio types.
Abstract
We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals. In this paper, we jointly reconstruct the full-band magnitude and phase by considering the bi-level relationships among the time-domain signal, its STFT coefficients, and its mel-spectrogram. The proposed method is formulated as a rigorous optimization problem and estimates the full-band magnitude based on the criterion used in GLA. Our experiments demonstrate the effectiveness of the proposed method on speech, music, and environmental signals.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Speech and Audio Processing · Ultrasonics and Acoustic Wave Propagation
