MIRNet: Learning multiple identities representations in overlapped   speech

Hyewon Han; Soo-Whan Chung; Hong-Goo Kang

arXiv:2008.01698·eess.AS·August 7, 2020·1 cites

MIRNet: Learning multiple identities representations in overlapped speech

Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

PDF

Open Access

TL;DR

This paper introduces a deep learning method to extract multiple speaker identities from overlapped speech signals, enabling improved speaker verification and speech separation without needing reference features.

Contribution

It proposes a novel deep speaker representation network that extracts multiple speaker identities directly from overlapped speech using only identity labels, unlike traditional methods.

Findings

01

Effective in speaker verification tasks

02

Improves speech separation conditioned on speaker embeddings

03

Requires only speaker identity labels for training

Abstract

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing