Literary and Colloquial Tamil Dialect Identification

M. Nanmalar; P. Vijayalakshmi; T. Nagarajan

arXiv:2408.13739·eess.AS·August 27, 2024

Literary and Colloquial Tamil Dialect Identification

M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

PDF

TL;DR

This paper explores various methods for distinguishing between Literary and Colloquial Tamil dialects to facilitate dialect conversion and improve human-computer interaction, achieving high accuracy in dialect identification.

Contribution

It introduces and evaluates multiple dialect identification methods, including novel explicit unified phone recognition approaches, for the first time in Tamil dialect research.

Findings

01

CNN achieved 93.97% accuracy

02

P-LVCSR achieved 94.21% accuracy

03

UPR-2 with P-LVCSR achieved 95.61% accuracy

Abstract

Culture and language evolve together. The old literary form of Tamil is used commonly for writing and the contemporary colloquial Tamil is used for speaking. Human-computer interaction applications require Colloquial Tamil (CT) to make it more accessible and easy for the everyday user and, it requires Literary Tamil (LT) when information is needed in a formal written format. Continuing the use of LT alongside CT in computer aided language learning applications will both preserve LT, and provide ease of use via CT, at the same time. Hence there is a need for the conversion between LT and CT dialects, which demands as a first step, dialect identification. Dialect Identification (DID) of LT and CT is an unexplored area of research. In the current work, keeping the nuances of both these dialects in mind, five methods are explored which include two implicit methods - Gaussian Mixture Model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.