TL;DR
This paper explores the use of sum-product networks (SPNs) for robust automatic speaker identification, demonstrating their advantages over CNNs in noise robustness and parameter efficiency, with potential for broader speech processing tasks.
Contribution
The paper introduces SPNs for robust speech processing, showing they outperform CNNs in noisy conditions and require fewer parameters, highlighting their potential as a future tool.
Findings
SPN speaker models are more robust than CNN-based models in noisy environments.
SPN models have significantly fewer parameters than CNN counterparts.
SPNs show potential for broader applications like ASR and ASV.
Abstract
We introduce sum-product networks (SPNs) for robust speech processing through a simple robust automatic speaker identification (ASI) task. SPNs are deep probabilistic graphical models capable of answering multiple probabilistic queries. We show that SPNs are able to remain robust by using the marginal probability density function (PDF) of the spectral features that reliably represent speech. Though current SPN toolkits and learning algorithms are in their infancy, we aim to show that SPNs have the potential to become a useful tool for robust speech processing in the future. SPN speaker models are evaluated here on real-world non-stationary and coloured noise sources at multiple signal-to-noise ratio (SNR) levels. In terms of ASI accuracy, we find that SPN speaker models are more robust than two recent convolutional neural network (CNN)-based ASI systems. Additionally, SPN speaker models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
