Towards automatic identification of linguistic politeness in Hindi texts

Ritesh Kumar

arXiv:2111.15268·cs.CL·December 1, 2021·1 cites

Towards automatic identification of linguistic politeness in Hindi texts

Ritesh Kumar

PDF

Open Access

TL;DR

This paper develops an SVM-based classifier to automatically identify linguistic politeness in Hindi texts, achieving high accuracy by leveraging culturally specific politeness structures from a large annotated corpus.

Contribution

It introduces a novel approach that incorporates discursive politeness structures specific to Hindi into machine learning for improved classification accuracy.

Findings

01

Classifier achieves over 77% accuracy, close to human performance.

02

Using culturally specific politeness features significantly improves classifier performance.

03

The approach demonstrates the importance of linguistic and cultural features in NLP tasks.

Abstract

In this paper I present a classifier for automatic identification of linguistic politeness in Hindi texts. I have used the manually annotated corpus of over 25,000 blog comments to train an SVM. Making use of the discursive and interactional approaches to politeness the paper gives an exposition of the normative, conventionalised politeness structures of Hindi. It is seen that using these manually recognised structures as features in training the SVM significantly improves the performance of the classifier on the test set. The trained system gives a significantly high accuracy of over 77% which is within 2% of human accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Discourse, Communication Strategies · Language, Metaphor, and Cognition · Swearing, Euphemism, Multilingualism

MethodsSupport Vector Machine