TIMIT Speaker Profiling: A Comparison of Multi-task learning and   Single-task learning Approaches

Rong Wang; Kun Sun

arXiv:2404.12077·cs.SD·April 19, 2024·1 cites

TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches

Rong Wang, Kun Sun

PDF

Open Access

TL;DR

This paper compares multi-task and single-task deep learning models for speaker profiling on the TIMIT dataset, highlighting the benefits, challenges, and importance of feature engineering in these tasks.

Contribution

It provides an empirical assessment of multi-task versus single-task learning approaches for speaker profiling and emphasizes the role of feature engineering.

Findings

01

Multi-task learning benefits similar complexity tasks.

02

Accent classification faces challenges.

03

Non-sequential features are effective for speaker recognition.

Abstract

This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models. The motivation for this research is twofold: firstly, to empirically assess the advantages and drawbacks of multi-task learning over single-task models in the context of speaker profiling; secondly, to emphasize the undiminished significance of skillful feature engineering for speaker recognition tasks. The findings reveal challenges in accent classification, and multi-task learning is found advantageous for tasks of similar complexity. Non-sequential features are favored for speaker recognition, but sequential ones can serve as starting points for complex models. The study underscores the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing