Women, Infamous, and Exotic Beings: A Comparative Study of Honorific Usages in Wikipedia and LLMs for Bengali and Hindi
Sourabrata Mukherjee, Atharva Mehta, Sougata Saha, Akhil Arora, Monojit Choudhury

TL;DR
This study analyzes honorific usage in Hindi and Bengali Wikipedia articles and evaluates how large language models replicate or diverge from these socio-pragmatic norms, revealing significant cross-linguistic and socio-demographic differences.
Contribution
It presents the first large-scale analysis of honorifics in Wikipedia and investigates LLMs' internalization of socio-pragmatic norms across languages and demographics.
Findings
Honorifics are more common in Bengali than Hindi.
Men are more often addressed with honorifics than women.
LLMs show divergence from Wikipedia honorific usage patterns.
Abstract
The obligatory use of third-person honorifics is a distinctive feature of several South Asian languages, encoding nuanced socio-pragmatic cues such as power, age, gender, fame, and social distance. In this work, (i) We present the first large-scale study of third-person honorific pronoun and verb usage across 10,000 Hindi and Bengali Wikipedia articles with annotations linked to key socio-demographic attributes of the subjects, including gender, age group, fame, and cultural origin. (ii) Our analysis uncovers systematic intra-language regularities but notable cross-linguistic differences: honorifics are more prevalent in Bengali than in Hindi, while non-honorifics dominate while referring to infamous, juvenile, and culturally exotic entities. Notably, in both languages, and more prominently in Hindi, men are more frequently addressed with honorifics than women. (iii) To examine whether…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWikis in Education and Collaboration
