Literary and Colloquial Dialect Identification for Tamil using Acoustic Features
M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

TL;DR
This paper presents an acoustic feature-based system using GMM and MFCC to identify Tamil dialects, aiding speech technology in dialect preservation without requiring annotated corpora.
Contribution
It introduces a dialect identification method that relies on acoustical features, eliminating the need for linguistic tools or annotated data, adaptable to other languages.
Findings
Achieved 12% error rate in dialect classification
Vowel nasalization contributes to classification accuracy
Performance varies with the number of GMM mixtures
Abstract
The evolution and diversity of a language is evident from it's various dialects. If the various dialects are not addressed in technological advancements like automatic speech recognition and speech synthesis, there is a chance that these dialects may disappear. Speech technology plays a role in preserving various dialects of a language from going extinct. In order to build a full fledged automatic speech recognition system that addresses various dialects, an Automatic Dialect Identification (ADI) system acting as the front end is required. This is similar to how language identification systems act as front ends to automatic speech recognition systems that handle multiple languages. The current work proposes a way to identify two popular and broadly classified Tamil dialects, namely literary and colloquial Tamil. Acoustical characteristics rather than phonetics and phonotactics are used,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
