Robust Feature Clustering for Unsupervised Speech Activity Detection

Harishchandra Dubey; Abhijeet Sangwan; John H. L. Hansen

arXiv:1806.09301·cs.SD·June 26, 2018

Robust Feature Clustering for Unsupervised Speech Activity Detection

Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen

PDF

Open Access

TL;DR

This paper introduces a robust, unsupervised speech activity detection method using clustering and Hartigan dip test, effective without annotated data, outperforming traditional GMM baselines on public safety datasets.

Contribution

The paper presents a novel unsupervised SAD approach leveraging Hartigan dip test for robust feature space segmentation, suitable for zero-resource scenarios.

Findings

01

Outperforms GMM baseline on NIST datasets

02

Robust to distortions due to statistical dip test

03

Effective in zero-resource speech processing

Abstract

In certain applications such as zero-resource speech processing or very-low resource speech-language systems, it might not be feasible to collect speech activity detection (SAD) annotations. However, the state-of-the-art supervised SAD techniques based on neural networks or other machine learning methods require annotated training data matched to the target domain. This paper establish a clustering approach for fully unsupervised SAD useful for cases where SAD annotations are not available. The proposed approach leverages Hartigan dip test in a recursive strategy for segmenting the feature space into prominent modes. Statistical dip is invariant to distortions that lends robustness to the proposed method. We evaluate the method on NIST OpenSAD 2015 and NIST OpenSAT 2017 public safety communications data. The results showed the superiority of proposed approach over the two-component GMM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing