A high speed unsupervised speaker retrieval using vector quantization   and second-order statistics

Konstantin Biatov

arXiv:1008.4658·cs.IR·September 13, 2010·1 cites

A high speed unsupervised speaker retrieval using vector quantization and second-order statistics

Konstantin Biatov

PDF

Open Access

TL;DR

This paper presents a fast, unsupervised speaker retrieval method that models audio data with a universal codebook and uses a two-level approach combining vector space and second-order statistics for improved accuracy.

Contribution

It introduces a novel two-level unsupervised speaker retrieval technique using vector quantization and second-order statistics, evaluated on broadcast news data.

Findings

01

Effective retrieval on Ester corpus

02

High speed performance demonstrated

03

Improved accuracy over baseline methods

Abstract

This paper describes an effective unsupervised method for query-by-example speaker retrieval. We suppose that only one speaker is in each audio file or in audio segment. The audio data are modeled using a common universal codebook. The codebook is based on bag-of-frames (BOF). The features corresponding to the audio frames are extracted from all audio files. These features are grouped into clusters using the K-means algorithm. The individual audio files are modeled by the normalized distribution of the numbers of cluster bins corresponding to this file. In the first level the k-nearest to the query files are retrieved using vector space representation. In the second level the second-order statistical measure is applied to obtained k-nearest files to find the final result of the retrieval. The described method is evaluated on the subset of Ester corpus of French broadcast news.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing