Compression of Higher Order Ambisonics with Multichannel RVQGAN

Toni Hirvonen; Mahmoud Namazi

arXiv:2411.12008·cs.SD·December 13, 2024

Compression of Higher Order Ambisonics with Multichannel RVQGAN

Toni Hirvonen, Mahmoud Namazi

PDF

Open Access

TL;DR

This paper introduces a multichannel RVQGAN extension for efficient third-order Ambisonics audio compression, achieving good quality at low bitrates and supporting immersive 7.1.4 playback.

Contribution

It presents a novel multichannel neural coding method with a specialized loss function and transfer learning for Ambisonics audio compression.

Findings

01

Suitable for 16-channel Ambisonics at 16 kbps

02

Maintains quality without increasing model bitrate

03

Effective for immersive 7.1.4 playback

Abstract

A multichannel extension to the RVQGAN neural coding method is proposed, and realized for data-driven compression of third-order Ambisonics audio. The input- and output layers of the generator and discriminator models are modified to accept multiple (16) channels without increasing the model bitrate. We also propose a loss function for accounting for spatial perception in immersive reproduction, and transfer learning from single-channel models. Listening test results with 7.1.4 immersive playback show that the proposed extension is suitable for coding scene-based, 16-channel Ambisonics content with good quality at 16 kbps when trained and tested on the EigenScape database. The model has potential applications for learning other types of content and multichannel formats.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques