Exploring CTC Based End-to-End Techniques for Myanmar Speech Recognition
Khin Me Me Chit, Laet Laet Lin

TL;DR
This paper investigates CTC-based end-to-end speech recognition models for Myanmar, experimenting with various architectures and encoding methods on a low-resource dataset, achieving low error rates.
Contribution
It introduces a comprehensive analysis of CTC-based models for Myanmar speech recognition, exploring model topology and encoding techniques in low-resource settings.
Findings
Best model achieves CER of 4.72%
SER of 12.38% on test set
Model topology significantly impacts performance
Abstract
In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 hours. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
