Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

Khin Me Me Chit; Laet Laet Lin

arXiv:2105.06253·cs.LG·May 17, 2021

Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition

Khin Me Me Chit, Laet Laet Lin

PDF

TL;DR

This paper investigates CTC-based end-to-end speech recognition models for Myanmar, experimenting with various architectures and encoding methods on a low-resource dataset, achieving low error rates.

Contribution

It introduces a comprehensive analysis of CTC-based models for Myanmar speech recognition, exploring model topology and encoding techniques in low-resource settings.

Findings

01

Best model achieves CER of 4.72%

02

SER of 12.38% on test set

03

Model topology significantly impacts performance

Abstract

In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 hours. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.