Music Enhancement via Image Translation and Vocoding

Nikhil Kandpal; Oriol Nieto; Zeyu Jin

arXiv:2204.13289·cs.SD·April 29, 2022

Music Enhancement via Image Translation and Vocoding

Nikhil Kandpal, Oriol Nieto, Zeyu Jin

PDF

Open Access

TL;DR

This paper introduces a novel deep learning method combining image translation and vocoding to enhance low-quality music recordings, outperforming classical and end-to-end baselines, and evaluates the reliability of common metrics in music enhancement.

Contribution

It proposes a new approach that manipulates mel-spectrograms with image translation and synthesizes waveforms with vocoding for music enhancement.

Findings

01

Outperforms classical mel-spectrogram inversion methods

02

Surpasses end-to-end waveform mapping baselines

03

Evaluates metric reliability in music enhancement

Abstract

Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ. This paper presents a deep learning approach to enhance low-quality music recordings by combining (i) an image-to-image translation model for manipulating audio in its mel-spectrogram representation and (ii) a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. We find that this approach to music enhancement outperforms baselines which use classical methods for mel-spectrogram inversion and an end-to-end approach directly mapping noisy waveforms to clean waveforms. Additionally, in evaluating the proposed method with a listening test, we analyze the reliability of common audio enhancement evaluation metrics when used in the music domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation