TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
Rong Wang, Kun Sun

TL;DR
This paper compares multi-task and single-task deep learning models for speaker profiling on the TIMIT dataset, highlighting the benefits, challenges, and importance of feature engineering in these tasks.
Contribution
It provides an empirical assessment of multi-task versus single-task learning approaches for speaker profiling and emphasizes the role of feature engineering.
Findings
Multi-task learning benefits similar complexity tasks.
Accent classification faces challenges.
Non-sequential features are effective for speaker recognition.
Abstract
This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models. The motivation for this research is twofold: firstly, to empirically assess the advantages and drawbacks of multi-task learning over single-task models in the context of speaker profiling; secondly, to emphasize the undiminished significance of skillful feature engineering for speaker recognition tasks. The findings reveal challenges in accent classification, and multi-task learning is found advantageous for tasks of similar complexity. Non-sequential features are favored for speaker recognition, but sequential ones can serve as starting points for complex models. The study underscores the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
