Neural Ambisonic Encoding For Multi-Speaker Scenarios Using A Circular Microphone Array
Yue Qiao, Vinay Kothapally, Meng Yu, Dong Yu

TL;DR
This paper introduces a deep learning approach for encoding circular microphone array signals into second-order Ambisonics in multi-speaker scenarios, improving spatial audio quality and source localization over traditional methods.
Contribution
A novel two-stage neural network architecture with a spatial power map loss and channel permutation technique for Ambisonic encoding from circular arrays.
Findings
Outperforms traditional signal processing methods in spatial quality.
Achieves higher source localization accuracy.
Provides better timbral quality in simulated tests.
Abstract
Spatial audio formats like Ambisonics are playback device layout-agnostic and well-suited for applications such as teleconferencing and virtual reality. Conventional Ambisonic encoding methods often rely on spherical microphone arrays for efficient sound field capture, which limits their flexibility in practical scenarios. We propose a deep learning (DL)-based approach, leveraging a two-stage network architecture for encoding circular microphone array signals into second-order Ambisonics (SOA) in multi-speaker environments. In addition, we introduce: (i) a novel loss function based on spatial power maps to regularize inter-channel correlations of the Ambisonic signals, and (ii) a channel permutation technique to resolve the ambiguity of encoding vertical information using a horizontal circular array. Evaluation on simulated speech and noise datasets shows that our approach consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
