Kannada named entity recognition and classification (nerc) based on   multinomial na\"ive bayes (mnb) classifier

S. Amarappa; S. V. Sathyanarayana

arXiv:1509.04385·cs.CL·September 21, 2015

Kannada named entity recognition and classification (nerc) based on multinomial na\"ive bayes (mnb) classifier

S. Amarappa, S. V. Sathyanarayana

PDF

TL;DR

This paper presents a novel Kannada NERC model using Multinomial Naive Bayes with feature extraction techniques, achieving promising precision, recall, and F1 scores on a sizable corpus.

Contribution

It introduces a new NERC approach for Kannada based on MNB classifier with tf-idf features, addressing language-specific challenges.

Findings

01

Achieved 83% precision, 79% recall, and 81% F1-score.

02

Utilized tf-idf vectorization for feature extraction.

03

Demonstrated effectiveness on a large corpus of Kannada text.

Abstract

Named Entity Recognition and Classification (NERC) is a process of identification of proper nouns in the text and classification of those nouns into certain predefined categories like person name, location, organization, date, and time etc. NERC in Kannada is an essential and challenging task. The aim of this work is to develop a novel model for NERC, based on Multinomial Na\"ive Bayes (MNB) Classifier. The Methodology adopted in this paper is based on feature extraction of training corpus, by using term frequency, inverse document frequency and fitting them to a tf-idf-vectorizer. The paper discusses the various issues in developing the proposed model. The details of implementation and performance evaluation are discussed. The experiments are conducted on a training corpus of size 95,170 tokens and test corpus of 5,000 tokens. It is observed that the model works with Precision, Recall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.