Audio inpainting with generative adversarial network
P. P. Ebner, A. Eltelt

TL;DR
This paper explores using a novel Wasserstein GAN architecture for long-range audio inpainting, demonstrating improved quality and frequency content reconstruction over classical models across different instruments and contexts.
Contribution
Introduces a new WGAN architecture with short- and long-range border handling for improved long-range audio inpainting.
Findings
Proposed model outperforms classical WGAN in audio inpainting quality.
Better reconstruction of high-frequency content.
More effective for instruments with lower frequency spectra.
Abstract
We study the ability of Wasserstein Generative Adversarial Network (WGAN) to generate missing audio content which is, in context, (statistically similar) to the sound and the neighboring borders. We deal with the challenge of audio inpainting long range gaps (500 ms) using WGAN models. We improved the quality of the inpainting part using a new proposed WGAN architecture that uses a short-range and a long-range neighboring borders compared to the classical WGAN model. The performance was compared with two different audio instruments (piano and guitar) and on virtuoso pianists together with a string orchestra. The objective difference grading (ODG) was used to evaluate the performance of both architectures. The proposed model outperforms the classical WGAN model and improves the reconstruction of high-frequency content. Further, we got better results for instruments where the frequency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
MethodsConvolution · Wasserstein GAN
