TL;DR
This paper introduces a novel non-negative tensor factorization method for sound source separation in Ambisonic recordings, leveraging spatial prior knowledge and multiple cost functions, outperforming existing techniques across various scenarios.
Contribution
It develops four algorithms based on different cost functions and priors, integrating spatial information into source separation within a MAP framework, with extensive experimental validation.
Findings
Proposed MAP methods outperform baseline ML and other techniques in separation quality.
Algorithms perform well across different source counts, reverberation levels, and prior knowledge accuracy.
Superior objective metrics (SDR, ISR, SIR, SAR) demonstrate the effectiveness of the approach.
Abstract
This article presents a Non-negative Tensor Factorization based method for sound source separation from Ambisonic microphone signals. The proposed method enables the use of prior knowledge about the Directions-of-Arrival (DOAs) of the sources, incorporated through a constraint on the Spatial Covariance Matrix (SCM) within a Maximum a Posteriori (MAP) framework. Specifically, this article presents a detailed derivation of four algorithms that are based on two types of cost functions, namely the squared Euclidean distance and the Itakura-Saito divergence, which are then combined with two prior probability distributions on the SCM, that is the Wishart and the Inverse Wishart. The experimental evaluation of the baseline Maximum Likelihood (ML) and the proposed MAP methods is primarily based on first-order Ambisonic recordings, using four different source signal datasets, three with musical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
