A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram
Anastasia Natsiou, Sean O'Leary

TL;DR
This paper introduces a sinusoidal model for inverting log-mel-spectrograms, improving sound synthesis quality for musical instruments over existing deep learning methods by better preserving temporal and spectral coherence.
Contribution
The paper presents a novel sinusoidal signal reconstruction method specifically for inverting log-mel-spectrograms, outperforming current deep learning approaches in musical instrument sound synthesis.
Findings
Outperforms state-of-the-art deep learning inversion methods.
Reduces audible distortions in synthesized sounds.
Preserves temporal and spectral coherence effectively.
Abstract
The synthesis of sound via deep learning methods has recently received much attention. Some problems for deep learning approaches to sound synthesis relate to the amount of data needed to specify an audio signal and the necessity of preserving both the long and short time coherence of the synthesised signal. Visual time-frequency representations such as the log-mel-spectrogram have gained in popularity. The log-mel-spectrogram is a perceptually informed representation of audio that greatly compresses the amount of information required for the description of the sound. However, because of this compression, this representation is not directly invertible. Both signal processing and machine learning techniques have previously been applied to the inversion of the log-mel-spectrogram but they both caused audible distortions in the synthesized sounds due to issues of temporal and spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Neural Networks and Applications · Music and Audio Processing
