Audio signal interpolation using optimal transportation of spectrograms

David Valdivia; Marien Renaud; Elsa Cazelles; C\'edric F\'evotte

arXiv:2502.15430·eess.SP·April 23, 2025

Audio signal interpolation using optimal transportation of spectrograms

David Valdivia, Marien Renaud, Elsa Cazelles, C\'edric F\'evotte

PDF

TL;DR

This paper introduces a new audio interpolation method using Wasserstein barycenters of spectrograms, avoiding frame-by-frame processing and incorporating a structured cost matrix for efficient and meaningful transport of spectral energy.

Contribution

The paper proposes a global spectrogram-based interpolation method using optimal transport with a structured cost matrix, improving over previous frame-based approaches.

Findings

01

Effective interpolation of synthetic and real sounds demonstrated

02

Reduced computational load due to structured cost matrix

03

Potential for high-quality audio synthesis and transformation

Abstract

We present a novel approach for generating an artificial audio signal that interpolates between given source and target sounds. Our approach relies on the computation of Wasserstein barycenters of the source and target spectrograms, followed by phase reconstruction and inversion. In contrast with previous works, our new method considers the spectrograms globally and does not operate on a temporal frame-to-frame basis. Another contribution is to endow the transportation cost matrix with a specific structure that prohibits remote displacements of energy along the time axis, and for which optimal transport is made possible by leveraging the unbalanced transport framework. The proposed cost matrix makes sense from the audio perspective and also allows to reduce the computation load. Results with synthetic musical notes and real environmental sounds illustrate the potential of our novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.