Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for   Ainu Language

Kohei Matsuura; Sei Ueno; Masato Mimura; Shinsuke Sakai; Tatsuya; Kawahara

arXiv:2002.06675·cs.CL·May 19, 2020·1 cites

Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language

Kohei Matsuura, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya, Kawahara

PDF

Open Access

TL;DR

This paper presents the development of an Ainu speech corpus and an end-to-end speech recognition system, demonstrating promising accuracy and the benefits of multilingual training for this critically endangered language.

Contribution

It introduces a new Ainu speech corpus and evaluates end-to-end ASR models, highlighting the effectiveness of syllable units and multilingual training for low-resource language recognition.

Findings

01

Syllable-based models outperform other units in accuracy.

02

Achieved over 60% word accuracy in speaker-open conditions.

03

Multilingual training with English and Japanese improves recognition performance.

Abstract

Ainu is an unwritten language that has been spoken by Ainu people who are one of the ethnic groups in Japan. It is recognized as critically endangered by UNESCO and archiving and documentation of its language heritage is of paramount importance. Although a considerable amount of voice recordings of Ainu folklore has been produced and accumulated to save their culture, only a quite limited parts of them are transcribed so far. Thus, we started a project of automatic speech recognition (ASR) for the Ainu language in order to contribute to the development of annotated language archives. In this paper, we report speech corpus development and the structure and performance of end-to-end ASR for Ainu. We investigated four modeling units (phone, syllable, word piece, and word) and found that the syllable-based model performed best in terms of both word and phone recognition accuracy, which were…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsTest