Literary and Colloquial Dialect Identification for Tamil using Acoustic   Features

M. Nanmalar; P. Vijayalakshmi; T. Nagarajan

arXiv:2408.14887·eess.AS·August 28, 2024

Literary and Colloquial Dialect Identification for Tamil using Acoustic Features

M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

PDF

TL;DR

This paper presents an acoustic feature-based system using GMM and MFCC to identify Tamil dialects, aiding speech technology in dialect preservation without requiring annotated corpora.

Contribution

It introduces a dialect identification method that relies on acoustical features, eliminating the need for linguistic tools or annotated data, adaptable to other languages.

Findings

01

Achieved 12% error rate in dialect classification

02

Vowel nasalization contributes to classification accuracy

03

Performance varies with the number of GMM mixtures

Abstract

The evolution and diversity of a language is evident from it's various dialects. If the various dialects are not addressed in technological advancements like automatic speech recognition and speech synthesis, there is a chance that these dialects may disappear. Speech technology plays a role in preserving various dialects of a language from going extinct. In order to build a full fledged automatic speech recognition system that addresses various dialects, an Automatic Dialect Identification (ADI) system acting as the front end is required. This is similar to how language identification systems act as front ends to automatic speech recognition systems that handle multiple languages. The current work proposes a way to identify two popular and broadly classified Tamil dialects, namely literary and colloquial Tamil. Acoustical characteristics rather than phonetics and phonotactics are used,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.