Ambisonizer: Neural Upmixing as Spherical Harmonics Generation
Yongyi Zang, Yifan Wang, Minglun Lee

TL;DR
This paper introduces Ambisonizer, a neural network approach that unifies mono and stereo upmixing by generating spherical harmonics for immersive audio, matching commercial stereo widener quality.
Contribution
It formulates neural upmixing as spherical harmonics generation, unifying mono and stereo upmixing into a single framework with competitive results.
Findings
Matches commercial stereo widener in subjective ratings
Unified approach improves neural upmixing consistency
Direct Ambisonic upmixing shows promising results
Abstract
Neural upmixing, the task of generating immersive music with an increased number of channels from fewer input channels, has been an active research area, with mono-to-stereo and stereo-to-surround upmixing treated as separate problems. In this paper, we propose a unified approach to neural upmixing by formulating it as spherical harmonics - more specifically, Ambisonic generation. We explicitly formulate mono upmixing as unconditional generation and stereo upmixing as conditional generation, where the stereo signals serve as conditions. We provide evidence that our proposed methodology, when decoded to stereo, matches a strong commercial stereo widener in subjective ratings. Overall, our work presents direct upmixing to Ambisonic format as a strong and promising approach to neural upmixing. A discussion on limitations is also provided.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques
