Beyond Omnidirectional: Neural Ambisonics Encoding for Arbitrary Microphone Directivity Patterns using Cross-Attention
Mikko Heikkinen, Archontis Politis, Konstantinos Drossos, Tuomas Virtanen

TL;DR
This paper introduces a neural network method for Ambisonics encoding that adapts to arbitrary microphone directivity patterns using cross-attention, improving spatial audio representation accuracy in diverse real-world scenarios.
Contribution
The approach leverages directional transfer functions and cross-attention to generalize Ambisonics encoding to various microphone configurations beyond geometry-based methods.
Findings
Outperforms traditional DSP and neural methods in simulated tests.
Using transfer functions as metadata enhances accuracy for realistic arrays.
Effective in reverberant environments with multiple sound sources.
Abstract
We present a deep neural network approach for encoding microphone array signals into Ambisonics that generalizes to arbitrary microphone array configurations with fixed microphone count but varying locations and frequency-dependent directional characteristics. Unlike previous methods that rely only on array geometry as metadata, our approach uses directional array transfer functions, enabling accurate characterization of real-world arrays. The proposed architecture employs separate encoders for audio and directional responses, combining them through cross-attention mechanisms to generate array-independent spatial audio representations. We evaluate the method on simulated data in two settings: a mobile phone with complex body scattering, and a free-field condition, both with varying numbers of sound sources in reverberant environments. Evaluations demonstrate that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Aerodynamics and Acoustics in Jet Flows
