A Fast Audio Clustering Using Vector Quantization and Second Order   Statistics

Konstantin Biatov

arXiv:1009.4719·cs.SD·September 27, 2010

A Fast Audio Clustering Using Vector Quantization and Second Order Statistics

Konstantin Biatov

PDF

Open Access

TL;DR

This paper introduces a fast, two-stage unsupervised speaker indexing method combining vector quantization and BIC, with an online threshold setting, tested on 10 hours of audio data.

Contribution

It proposes a novel two-stage clustering algorithm that significantly speeds up speaker indexing by combining VQ and BIC, with an adaptive threshold determination.

Findings

01

The method reduces computational time compared to traditional BIC-based clustering.

02

The online threshold setting eliminates the need for development data.

03

The approach achieves effective speaker indexing on 10 hours of audio.

Abstract

This paper describes an effective unsupervised speaker indexing approach. We suggest a two stage algorithm to speed-up the state-of-the-art algorithm based on the Bayesian Information Criterion (BIC). In the first stage of the merging process a computationally cheap method based on the vector quantization (VQ) is used. Then in the second stage a more computational expensive technique based on the BIC is applied. In the speaker indexing task a turning parameter or a threshold is used. We suggest an on-line procedure to define the value of a turning parameter without using development data. The results are evaluated using 10 hours of audio data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis