Multi-speaker Recognition in Cocktail Party Problem
Yiqian Wang, Wensheng Sun

TL;DR
This paper introduces a statistical decision framework for recognizing multiple speakers in cocktail party scenarios, utilizing Gaussian assumptions, voiceprint features, and a novel constellation mapping to improve identification accuracy.
Contribution
It presents a new approach combining Gaussian-based statistical decision theory with Euclidean distance voiceprint analysis for multi-speaker recognition.
Findings
Effective voiceprint feature extraction using Mel-Frequency Cepstral Coefficients
Mapping of speaker relationships through a thirteen-dimensional constellation
Enhanced recognition accuracy in multi-speaker environments
Abstract
This paper proposes an original statistical decision theory to accomplish a multi-speaker recognition task in cocktail party problem. This theory relies on an assumption that the varied frequencies of speakers obey Gaussian distribution and the relationship of their voiceprints can be represented by Euclidean distance vectors. This paper uses Mel-Frequency Cepstral Coefficients to extract the feature of a voice in judging whether a speaker is included in a multi-speaker environment and distinguish who the speaker should be. Finally, a thirteen-dimension constellation drawing is established by mapping from Manhattan distances of speakers in order to take a thorough consideration about gross influential factors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
