Residual Learning for Neural Ambisonics Encoders

Thomas Deppisch; Yang Gao; Manan Mittal; Benjamin Stahl; Christoph Hold; David Alon; Zamir Ben-Hur

arXiv:2601.18322·eess.AS·January 27, 2026

Residual Learning for Neural Ambisonics Encoders

Thomas Deppisch, Yang Gao, Manan Mittal, Benjamin Stahl, Christoph Hold, David Alon, Zamir Ben-Hur

PDF

Open Access

TL;DR

This paper proposes a residual learning framework that combines linear and neural network encoders to improve spatial audio capture for wearable devices, demonstrating consistent improvements in real-world scenarios.

Contribution

It introduces a residual learning approach that refines linear Ambisonics encoders with neural networks, enhancing performance in practical applications.

Findings

01

Neural encoders outperform linear baseline only within residual framework.

02

Residual models show significant improvements across all metrics for in-domain data.

03

Neural encoders still struggle with high-frequency directional accuracy.

Abstract

Emerging wearable devices such as smartglasses and extended reality headsets demand high-quality spatial audio capture from compact, head-worn microphone arrays. Ambisonics provides a device-agnostic spatial audio representation by mapping array signals to spherical harmonic (SH) coefficients. In practice, however, accurate encoding remains challenging. While traditional linear encoders are signal-independent and robust, they amplify low-frequency noise and suffer from high-frequency spatial aliasing. On the other hand, neural network approaches can outperform linear encoders but they often assume idealized microphones and may perform inconsistently in real-world scenarios. To leverage their complementary strengths, we introduce a residual-learning framework that refines a linear encoder with corrections from a neural network. Using measured array transfer functions from smartglasses,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music Technology and Sound Studies · Music and Audio Processing