Tamil Vowel Recognition With Augmented MNIST-like Data Set

Muthiah Annamalai

arXiv:2006.08367·cs.CV·June 18, 2020

Tamil Vowel Recognition With Augmented MNIST-like Data Set

Muthiah Annamalai

PDF

Open Access

TL;DR

This paper introduces a Tamil vowel dataset compatible with MNIST, demonstrating that a 4-layer CNN can achieve over 82% accuracy in recognizing Tamil vowels, advancing OCR and handwriting recognition for Tamil script.

Contribution

The creation of a Tamil vowel dataset similar to MNIST and demonstrating its use with CNNs for high-accuracy recognition is the paper's main novelty.

Findings

01

Achieved 92% training accuracy with CNN on the dataset.

02

Achieved 82% cross-validation accuracy.

03

Top-1 accuracy of 70% on handwritten vowels.

Abstract

We report generation of a MNIST [4] compatible data set [1] for Tamil vowels to enable building a classification DNN or other such ML/AI deep learning [2] models for Tamil OCR/Handwriting applications. We report the capability of the 60,000 grayscale, 28x28 pixel dataset to build a 92% accuracy (training) and 82% cross-validation 4-layer CNN, with 100,000+ parameters, in TensorFlow. We also report a top-1 classification accuracy of 70% and top-2 classification accuracy of 92% on handwritten vowels showing, for the same network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Music and Audio Processing · Speech Recognition and Synthesis