Cloud-based Automatic Speech Recognition Systems for Southeast Asian   Languages

Lei Wang; Rong Tong; Cheung Chi Leung; Sunil Sivadas; Chongjia Ni; Bin; Ma

arXiv:2210.03580·cs.CL·October 10, 2022

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Lei Wang, Rong Tong, Cheung Chi Leung, Sunil Sivadas, Chongjia Ni, Bin, Ma

PDF

TL;DR

This paper discusses the development of cloud-based automatic speech recognition systems for Southeast Asian languages, focusing on resource collection strategies for Bahasa Indonesia and Thai amid resource limitations.

Contribution

It introduces resource collection strategies for building ASR systems for under-resourced Southeast Asian languages using cloud-based approaches.

Findings

01

Effective resource collection methods demonstrated for Bahasa Indonesia and Thai

02

Addressed challenges of limited speech and text data in regional languages

03

Proposed strategies improve ASR development for under-resourced languages

Abstract

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.