A Fast Audio Clustering Using Vector Quantization and Second Order Statistics
Konstantin Biatov

TL;DR
This paper introduces a fast, two-stage unsupervised speaker indexing method combining vector quantization and BIC, with an online threshold setting, tested on 10 hours of audio data.
Contribution
It proposes a novel two-stage clustering algorithm that significantly speeds up speaker indexing by combining VQ and BIC, with an adaptive threshold determination.
Findings
The method reduces computational time compared to traditional BIC-based clustering.
The online threshold setting eliminates the need for development data.
The approach achieves effective speaker indexing on 10 hours of audio.
Abstract
This paper describes an effective unsupervised speaker indexing approach. We suggest a two stage algorithm to speed-up the state-of-the-art algorithm based on the Bayesian Information Criterion (BIC). In the first stage of the merging process a computationally cheap method based on the vector quantization (VQ) is used. Then in the second stage a more computational expensive technique based on the BIC is applied. In the speaker indexing task a turning parameter or a threshold is used. We suggest an on-line procedure to define the value of a turning parameter without using development data. The results are evaluated using 10 hours of audio data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
