Modelling Data Dispersion Degree in Automatic Robust Estimation for   Multivariate Gaussian Mixture Models with an Application to Noisy Speech   Processing

Dalei Wu; Haiqing Wu

arXiv:1405.4599·cs.CL·May 20, 2014

Modelling Data Dispersion Degree in Automatic Robust Estimation for Multivariate Gaussian Mixture Models with an Application to Noisy Speech Processing

Dalei Wu, Haiqing Wu

PDF

Open Access

TL;DR

This paper introduces a novel automatic robust estimation method for multivariate Gaussian mixture models by measuring data dispersion, effectively handling outliers in noisy speech processing and improving speaker recognition accuracy.

Contribution

It proposes a dispersion degree-based approach for automatic outlier removal in MGMM training, with theoretical analysis and practical validation.

Findings

01

Significantly improves robustness in noisy speech processing

02

Theoretical proof of dispersion degree distribution

03

Enhanced speaker recognition performance

Abstract

The trimming scheme with a prefixed cutoff portion is known as a method of improving the robustness of statistical models such as multivariate Gaussian mixture models (MG- MMs) in small scale tests by alleviating the impacts of outliers. However, when this method is applied to real- world data, such as noisy speech processing, it is hard to know the optimal cut-off portion to remove the outliers and sometimes removes useful data samples as well. In this paper, we propose a new method based on measuring the dispersion degree (DD) of the training data to avoid this problem, so as to realise automatic robust estimation for MGMMs. The DD model is studied by using two different measures. For each one, we theoretically prove that the DD of the data samples in a context of MGMMs approximately obeys a specific (chi or chi-square) distribution. The proposed method is evaluated on a real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Advanced Statistical Methods and Models · Speech Recognition and Synthesis