Kosp2e: Korean Speech to English Translation Corpus

Won Ik Cho; Seok Min Kim; Hyunchang Cho; Nam Soo Kim

arXiv:2107.02875·cs.CL·July 8, 2021

Kosp2e: Korean Speech to English Translation Corpus

Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim

PDF

1 Repo 1 Models

TL;DR

This paper introduces Kosp2e, a Korean speech-to-English translation corpus, enabling end-to-end translation and demonstrating promising results with BLEU scores of 21.3 and 18.0, thus supporting non-English speech translation.

Contribution

The paper presents a publicly available Korean speech-to-English translation corpus and evaluates its effectiveness with various end-to-end translation models.

Findings

01

Achieved BLEU scores of 21.3 and 18.0 with different models.

02

Validated the feasibility of Korean speech-to-English translation using the dataset.

03

Demonstrated the potential for community-driven annotation expansion.

Abstract

Most speech-to-text (S2T) translation studies use English speech as a source, which makes it difficult for non-English speakers to take advantage of the S2T technologies. For some languages, this problem was tackled through corpus construction, but the farther linguistically from English or the more under-resourced, this deficiency and underrepresentedness becomes more significant. In this paper, we introduce kosp2e (read as `kospi'), a corpus that allows Korean speech to be translated into English text in an end-to-end manner. We adopt open license speech recognition corpus, translation corpus, and spoken language corpora to make our dataset freely available to the public, and check the performance through the pipeline and training-based approaches. Using pipeline and various end-to-end schemes, we obtain the highest BLEU of 21.3 and 18.0 for each based on the English hypothesis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

warnikchow/kosp2e
pytorchOfficial

Models

🤗
espnet/kosp2e-asr-ko
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.