Speech-to-Speech Translation For A Real-world Unwritten Language

Peng-Jen Chen; Kevin Tran; Yilin Yang; Jingfei Du; Justine Kao; Yu-An; Chung; Paden Tomasello; Paul-Ambroise Duquenne; Holger Schwenk; Hongyu Gong,; Hirofumi Inaguma; Sravya Popuri; Changhan Wang; Juan Pino; Wei-Ning Hsu; Ann; Lee

arXiv:2211.06474·cs.CL·November 17, 2022·5 cites

Speech-to-Speech Translation For A Real-world Unwritten Language

Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An, Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong,, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann, Lee

PDF

Open Access

TL;DR

This paper develops an end-to-end speech-to-speech translation system for unwritten languages, using English-Taiwanese Hokkien as a case study, and introduces new data collection, modeling techniques, and a benchmark dataset.

Contribution

It presents novel methods for data collection, weak supervision, and leveraging related languages, along with a benchmark dataset for unwritten language translation.

Findings

01

Effective use of pseudo-labeling for weakly supervised data

02

Leveraging related language (Mandarin) improves translation quality

03

Release of a new benchmark dataset for unwritten language S2ST

Abstract

We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling