Attention-based multi-task learning for speech-enhancement and   speaker-identification in multi-speaker dialogue scenario

Chiang-Jen Peng; Yun-Ju Chan; Cheng Yu; Syu-Siang Wang; Yu Tsao and; Tai-Shih Chi

arXiv:2101.02550·eess.AS·February 23, 2021

Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Chiang-Jen Peng, Yun-Ju Chan, Cheng Yu, Syu-Siang Wang, Yu Tsao and, Tai-Shih Chi

PDF

Open Access 1 Repo

TL;DR

This paper introduces an attention-based multi-task learning system that simultaneously enhances speech quality and identifies speakers in noisy multi-speaker dialogue scenarios, improving performance in both tasks.

Contribution

The study proposes a novel attention-based multi-task learning framework integrating speech enhancement and speaker identification, with a combined LSTM and DNN architecture for improved robustness.

Findings

01

Enhanced speech quality and intelligibility in noisy environments.

02

Improved speaker identification accuracy.

03

Effective multi-task learning architecture demonstrated on Mandarin test sentences.

Abstract

Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI). The proposed ATM system consists of three parts: SE, SI, and attention-Net (AttNet). The SE part is composed of a long-short-term memory (LSTM) model, and a deep neural network (DNN) model is used to develop the SI and AttNet parts. The overall ATM system first extracts the representative features and then enhances the speech signals in LSTM-SE and specifies speaker identity in DNN-SI. The AttNet computes weights based on DNN-SI to prepare better representative features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IanChan711/ATM_ide
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis