Collaborative Learning for Language and Speaker Recognition

Lantian Li; Zhiyuan Tang; Dong Wang; Andrew Abel; Yang Feng; Shiyue; Zhang

arXiv:1609.08442·cs.SD·May 24, 2017·2 cites

Collaborative Learning for Language and Speaker Recognition

Lantian Li, Zhiyuan Tang, Dong Wang, Andrew Abel, Yang Feng, Shiyue, Zhang

PDF

Open Access

TL;DR

This paper introduces a multi-task recurrent neural network that jointly performs language and speaker recognition, leveraging collaborative learning to enhance accuracy in both tasks.

Contribution

It proposes a novel unified model where language and speaker recognition tasks are integrated through a multi-task recurrent neural network, enabling mutual improvement.

Findings

01

Multi-task model outperforms task-specific models on both tasks

02

Collaborative learning improves recognition accuracy

03

Joint modeling benefits both language and speaker recognition

Abstract

This paper presents a unified model to perform language and speaker recognition simultaneously and altogether. The model is based on a multi-task recurrent neural network where the output of one task is fed as the input of the other, leading to a collaborative learning framework that can improve both language and speaker recognition by borrowing information from each other. Our experiments demonstrated that the multi-task model outperforms the task-specific models on both tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing