Multi-accent Speech Separation with One Shot Learning

Kuan-Po Huang; Yuan-Kuei Wu; Hung-yi Lee

arXiv:2106.11713·cs.SD·August 6, 2021

Multi-accent Speech Separation with One Shot Learning

Kuan-Po Huang, Yuan-Kuei Wu, Hung-yi Lee

PDF

Open Access

TL;DR

This paper addresses multi-accent speech separation by applying meta-learning techniques, MAML and FOMAML, to improve adaptation to unseen speakers and accents, outperforming traditional joint training methods.

Contribution

The study introduces the use of MAML and FOMAML for multi-accent speech separation, demonstrating their effectiveness in handling unseen accents and reducing training time.

Findings

01

MAML and FOMAML outperform joint training on unseen accents.

02

FOMAML achieves similar performance to MAML with less training time.

03

Meta-learning methods improve adaptation to new speakers and accents.

Abstract

Speech separation is a problem in the field of speech processing that has been studied in full swing recently. However, there has not been much work studying a multi-accent speech separation scenario. Unseen speakers with new accents and noise aroused the domain mismatch problem which cannot be easily solved by conventional joint training methods. Thus, we applied MAML and FOMAML to tackle this problem and obtained higher average Si-SNRi values than joint training on almost all the unseen accents. This proved that these two methods do have the ability to generate well-trained parameters for adapting to speech mixtures of new speakers and accents. Furthermore, we found out that FOMAML obtains similar performance compared to MAML while saving a lot of time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing