Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram
David Valdivia, Elsa Cazelles, C\'edric F\'evotte

TL;DR
This paper introduces a novel super-resolution spectrogram fusion method using optimal transport barycenters, allowing for improved time-frequency localization without requiring identical grids, validated on synthetic and speech signals.
Contribution
It proposes a new OT-based barycenter approach for spectrogram fusion that is flexible with input grids and computationally efficient, outperforming existing methods.
Findings
The method achieves sharper time-frequency localization.
It outperforms state-of-the-art unsupervised fusion techniques.
Validated on synthetic and speech signals with both quantitative and qualitative results.
Abstract
Time-frequency representations, such as the short-time Fourier transform (STFT), are fundamental tools for analyzing non-stationary signals. However, their ability to achieve sharp localization in both time and frequency is inherently limited by the Gabor-Heisenberg uncertainty principle. In this paper, we address this limitation by introducing a method to generate super-resolution spectrograms through the fusion of two or more spectrograms with varying resolutions. Specifically, we compute the super-resolution spectrogram as the barycenter of input spectrograms using optimal transport (OT) divergences. Unlike existing fusion approaches, our method does not require the input spectrograms to share the same time-frequency grid. Instead, the input spectrograms can be computed using any STFT parameters, and the resulting super-resolution spectrogram can be defined on an arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
