Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

Yongyi Zang; Yifan Wang; Minglun Lee

arXiv:2405.13428·cs.SD·May 24, 2024

Ambisonizer: Neural Upmixing as Spherical Harmonics Generation

Yongyi Zang, Yifan Wang, Minglun Lee

PDF

Open Access

TL;DR

This paper introduces Ambisonizer, a neural network approach that unifies mono and stereo upmixing by generating spherical harmonics for immersive audio, matching commercial stereo widener quality.

Contribution

It formulates neural upmixing as spherical harmonics generation, unifying mono and stereo upmixing into a single framework with competitive results.

Findings

01

Matches commercial stereo widener in subjective ratings

02

Unified approach improves neural upmixing consistency

03

Direct Ambisonic upmixing shows promising results

Abstract

Neural upmixing, the task of generating immersive music with an increased number of channels from fewer input channels, has been an active research area, with mono-to-stereo and stereo-to-surround upmixing treated as separate problems. In this paper, we propose a unified approach to neural upmixing by formulating it as spherical harmonics - more specifically, Ambisonic generation. We explicitly formulate mono upmixing as unconditional generation and stereo upmixing as conditional generation, where the stereo signals serve as conditions. We provide evidence that our proposed methodology, when decoded to stereo, matches a strong commercial stereo widener in subjective ratings. Overall, our work presents direct upmixing to Ambisonic format as a strong and promising approach to neural upmixing. A discussion on limitations is also provided.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques