Mono-to-stereo through parametric stereo generation
Joan Serr\`a, Davide Scaini, Santiago Pascual, Daniel Arteaga, Jordi, Pons, Jeroen Breebaart, Giulio Cengarle

TL;DR
This paper explores converting mono audio to stereo by predicting parametric stereo parameters with neural networks and generative models, achieving more realistic spatial imaging and multiple plausible stereo renditions.
Contribution
It introduces a novel combination of parametric stereo prediction and generative modeling for mono-to-stereo conversion, outperforming classical methods.
Findings
PS-based models outperform classical decorrelation baseline
Generative models outperform non-generative counterparts within PS framework
Proposes multiple plausible stereo outputs from a single mono signal
Abstract
Generating a stereophonic presentation from a monophonic audio signal is a challenging open task, especially if the goal is to obtain a realistic spatial imaging with a specific panning of sound elements. In this work, we propose to convert mono to stereo by means of predicting parametric stereo (PS) parameters using both nearest neighbor and deep network approaches. In combination with PS, we also propose to model the task with generative approaches, allowing to synthesize multiple and equally-plausible stereo renditions from the same mono signal. To achieve this, we consider both autoregressive and masked token modelling approaches. We provide evidence that the proposed PS-based models outperform a competitive classical decorrelation baseline and that, within a PS prediction framework, modern generative models outshine equivalent non-generative counterparts. Overall, our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Hearing Loss and Rehabilitation · Speech and Audio Processing
