Literary and Colloquial Tamil Dialect Identification
M. Nanmalar, P. Vijayalakshmi, T. Nagarajan

TL;DR
This paper explores various methods for distinguishing between Literary and Colloquial Tamil dialects to facilitate dialect conversion and improve human-computer interaction, achieving high accuracy in dialect identification.
Contribution
It introduces and evaluates multiple dialect identification methods, including novel explicit unified phone recognition approaches, for the first time in Tamil dialect research.
Findings
CNN achieved 93.97% accuracy
P-LVCSR achieved 94.21% accuracy
UPR-2 with P-LVCSR achieved 95.61% accuracy
Abstract
Culture and language evolve together. The old literary form of Tamil is used commonly for writing and the contemporary colloquial Tamil is used for speaking. Human-computer interaction applications require Colloquial Tamil (CT) to make it more accessible and easy for the everyday user and, it requires Literary Tamil (LT) when information is needed in a formal written format. Continuing the use of LT alongside CT in computer aided language learning applications will both preserve LT, and provide ease of use via CT, at the same time. Hence there is a need for the conversion between LT and CT dialects, which demands as a first step, dialect identification. Dialect Identification (DID) of LT and CT is an unexplored area of research. In the current work, keeping the nuances of both these dialects in mind, five methods are explored which include two implicit methods - Gaussian Mixture Model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
