Compression of Higher Order Ambisonics with Multichannel RVQGAN
Toni Hirvonen, Mahmoud Namazi

TL;DR
This paper introduces a multichannel RVQGAN extension for efficient third-order Ambisonics audio compression, achieving good quality at low bitrates and supporting immersive 7.1.4 playback.
Contribution
It presents a novel multichannel neural coding method with a specialized loss function and transfer learning for Ambisonics audio compression.
Findings
Suitable for 16-channel Ambisonics at 16 kbps
Maintains quality without increasing model bitrate
Effective for immersive 7.1.4 playback
Abstract
A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
