Multi-Label Training for Text-Independent Speaker Identification

Yuqi Xue

arXiv:2211.07373·eess.AS·August 19, 2024

Multi-Label Training for Text-Independent Speaker Identification

Yuqi Xue

PDF

Open Access

TL;DR

This paper introduces Multi-Label Training (MLT) for text-independent speaker identification, which improves accuracy by assigning multiple labels to speech segments, leveraging ensemble-like benefits without high computational costs.

Contribution

The paper proposes a novel Multi-Label Training strategy that enhances speaker identification accuracy and robustness, applicable to existing models without significant computational overhead.

Findings

01

MLT improves identification accuracy in both clean and noisy conditions.

02

MLT leverages ensemble effects to enhance model performance.

03

The strategy is easily adaptable to current speaker identification systems.

Abstract

In this paper, we propose a novel strategy for text-independent speaker identification system: Multi-Label Training (MLT). Instead of the commonly used one-to-one correspondence between the speech and the speaker label, we divide all the speeches of each speaker into several subgroups, with each subgroup assigned a different set of labels. During the identification process, a specific speaker is identified as long as the predicted label is the same as one of his/her corresponding labels. We found that this method can force the model to distinguish the data more accurately, and somehow takes advantages of ensemble learning, while avoiding the significant increase of computation and storage burden. In the experiments, we found that not only in clean conditions, but also in noisy conditions with speech enhancement, Multi-Label Training can still achieve better identification performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Text and Document Classification Technologies