Multi-speaker Recognition in Cocktail Party Problem

Yiqian Wang; Wensheng Sun

arXiv:1712.01742·eess.AS·December 6, 2017

Multi-speaker Recognition in Cocktail Party Problem

Yiqian Wang, Wensheng Sun

PDF

Open Access

TL;DR

This paper introduces a statistical decision framework for recognizing multiple speakers in cocktail party scenarios, utilizing Gaussian assumptions, voiceprint features, and a novel constellation mapping to improve identification accuracy.

Contribution

It presents a new approach combining Gaussian-based statistical decision theory with Euclidean distance voiceprint analysis for multi-speaker recognition.

Findings

01

Effective voiceprint feature extraction using Mel-Frequency Cepstral Coefficients

02

Mapping of speaker relationships through a thirteen-dimensional constellation

03

Enhanced recognition accuracy in multi-speaker environments

Abstract

This paper proposes an original statistical decision theory to accomplish a multi-speaker recognition task in cocktail party problem. This theory relies on an assumption that the varied frequencies of speakers obey Gaussian distribution and the relationship of their voiceprints can be represented by Euclidean distance vectors. This paper uses Mel-Frequency Cepstral Coefficients to extract the feature of a voice in judging whether a speaker is included in a multi-speaker environment and distinguish who the speaker should be. Finally, a thirteen-dimension constellation drawing is established by mapping from Manhattan distances of speakers in order to take a thorough consideration about gross influential factors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis