Audio signal interpolation using optimal transportation of spectrograms
David Valdivia, Marien Renaud, Elsa Cazelles, C\'edric F\'evotte

TL;DR
This paper introduces a new audio interpolation method using Wasserstein barycenters of spectrograms, avoiding frame-by-frame processing and incorporating a structured cost matrix for efficient and meaningful transport of spectral energy.
Contribution
The paper proposes a global spectrogram-based interpolation method using optimal transport with a structured cost matrix, improving over previous frame-based approaches.
Findings
Effective interpolation of synthetic and real sounds demonstrated
Reduced computational load due to structured cost matrix
Potential for high-quality audio synthesis and transformation
Abstract
We present a novel approach for generating an artificial audio signal that interpolates between given source and target sounds. Our approach relies on the computation of Wasserstein barycenters of the source and target spectrograms, followed by phase reconstruction and inversion. In contrast with previous works, our new method considers the spectrograms globally and does not operate on a temporal frame-to-frame basis. Another contribution is to endow the transportation cost matrix with a specific structure that prohibits remote displacements of energy along the time axis, and for which optimal transport is made possible by leveraging the unbalanced transport framework. The proposed cost matrix makes sense from the audio perspective and also allows to reduce the computation load. Results with synthetic musical notes and real environmental sounds illustrate the potential of our novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
