Multilingual Speech Evaluation: Case Studies on English, Malay and Tamil
Huayun Zhang, Ke Shi, Nancy F. Chen

TL;DR
This paper explores multilingual speech evaluation across English, Malay, and Tamil, using music-inspired features to improve pronunciation, rhythm, and intonation assessment, especially for low-resource languages.
Contribution
It introduces a language-agnostic feature extraction approach inspired by music processing, enabling effective speech evaluation across diverse languages with different rhythm patterns.
Findings
Consistent performance improvements across all three languages.
Effective evaluation of pronunciation, rhythm, and intonation.
Robust features that generalize well to low-resource languages.
Abstract
Speech evaluation is an essential component in computer-assisted language learning (CALL). While speech evaluation on English has been popular, automatic speech scoring on low resource languages remains challenging. Work in this area has focused on monolingual specific designs and handcrafted features stemming from resource-rich languages like English. Such approaches are often difficult to generalize to other languages, especially if we also want to consider suprasegmental qualities such as rhythm. In this work, we examine three different languages that possess distinct rhythm patterns: English (stress-timed), Malay (syllable-timed), and Tamil (mora-timed). We exploit robust feature representations inspired by music processing and vector representation learning. Empirical validations show consistent gains for all three languages when predicting pronunciation, rhythm and intonation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Music and Audio Processing
