Mono-to-stereo through parametric stereo generation

Joan Serr\`a; Davide Scaini; Santiago Pascual; Daniel Arteaga; Jordi; Pons; Jeroen Breebaart; Giulio Cengarle

arXiv:2306.14647·cs.SD·June 27, 2023·1 cites

Mono-to-stereo through parametric stereo generation

Joan Serr\`a, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi, Pons, Jeroen Breebaart, Giulio Cengarle

PDF

Open Access

TL;DR

This paper explores converting mono audio to stereo by predicting parametric stereo parameters with neural networks and generative models, achieving more realistic spatial imaging and multiple plausible stereo renditions.

Contribution

It introduces a novel combination of parametric stereo prediction and generative modeling for mono-to-stereo conversion, outperforming classical methods.

Findings

01

PS-based models outperform classical decorrelation baseline

02

Generative models outperform non-generative counterparts within PS framework

03

Proposes multiple plausible stereo outputs from a single mono signal

Abstract

Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Hearing Loss and Rehabilitation · Speech and Audio Processing