Investigating Zero-Shot Generalizability on Mandarin-English   Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models   with Self-Supervision and Weak Supervision

Chih-Kai Yang; Kuan-Po Huang; Ke-Han Lu; Chun-Yi Kuan; Chi-Yuan Hsiao,; Hung-yi Lee

arXiv:2401.00273·eess.AS·January 2, 2024·1 cites

Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao,, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper evaluates recent foundation models on Mandarin-English code-switched speech tasks, showing self-supervised models perform nearly as well as supervised ones but still face challenges with intra-sentential switching.

Contribution

It provides a comprehensive assessment of large-scale self- and weakly-supervised models on code-switched speech, highlighting their strengths and areas for improvement.

Findings

01

Self-supervised models nearly match supervised performance

02

Models struggle with intra-sentential code-switching

03

Variants of Whisper remain effective in code-switching scenarios

Abstract

This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nobel861017/cs_zs_baseline
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling