FOA Tokenizer: Low-bitrate Neural Codec for First Order Ambisonics with Spatial Consistency Loss

Parthasaarathy Sudarsanam; Sebastian Braun; Hannes Gamper

arXiv:2510.22241·cs.SD·October 28, 2025

FOA Tokenizer: Low-bitrate Neural Codec for First Order Ambisonics with Spatial Consistency Loss

Parthasaarathy Sudarsanam, Sebastian Braun, Hannes Gamper

PDF

TL;DR

This paper introduces FOA Tokenizer, a neural spatial audio codec for first-order ambisonics that achieves low-bitrate compression while preserving spatial cues, enabling accurate reconstruction and useful features for spatial audio tasks.

Contribution

It presents the first neural spatial audio codec for FOA signals with a novel spatial consistency loss and demonstrates effective compression and downstream utility.

Findings

01

Achieves 0.9 kbps bit rate for 4-channel FOA audio.

02

Reconstructs spatial cues with mean angular errors under 26 degrees.

03

Provides features beneficial for spatial audio tasks like localization.

Abstract

Neural audio codecs have been widely studied for mono and stereo signals, but spatial audio remains largely unexplored. We present the first discrete neural spatial audio codec for first-order ambisonics (FOA). Building on the WavTokenizer architecture, we extend it to support four-channel FOA signals and introduce a novel spatial consistency loss to preserve directional cues in the reconstructed signals under a highly compressed representation. Our codec compresses 4-channel FOA audio at 24 kHz into 75 discrete tokens per second, corresponding to a bit rate of 0.9 kbps. Evaluations on simulated reverberant mixtures, non-reverberant clean speech, and FOA mixtures with real room impulse responses show accurate reconstruction, with mean angular errors of 13.76{\deg}, 3.96{\deg}, and 25.83{\deg}, respectively, across the three conditions. In addition, discrete latent representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.