STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu   Speech using Transfer Learning, Attention, & Data Augmentation

Saad Naeem; Omer Beg

arXiv:2204.07848·cs.CL·April 19, 2022·1 cites

STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Saad Naeem, Omer Beg

PDF

Open Access

TL;DR

This paper introduces STRATA, a transfer learning-based neural framework that effectively recognizes phonemes and word boundaries in continuous Urdu speech, significantly reducing data requirements and improving accuracy over existing methods.

Contribution

STRATA is a novel framework combining transfer learning, attention, and data augmentation for phoneme recognition in low-resource languages like Urdu, reducing data annotation needs.

Findings

01

Achieves 16.5% Phoneme Error Rate on Urdu speech

02

Reduces network loss by 50% with transfer learning

03

Improves state-of-the-art accuracy on Urdu and English datasets

Abstract

Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a framework for supervised phoneme recognition that overcomes the data scarcity issue for low resource languages using a seq2seq neural architecture integrated with transfer learning, attention mechanism, and data augmentation. STRATA employs transfer learning to reduce the network loss in half. It uses attention mechanism for word boundaries and frame alignment detection which further reduces the network loss by 4% and is able to identify the word boundaries with 92.2% accuracy. STRATA uses various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence