Linguistic Taboos and Euphemisms in Nepali

Nobal B. Niraula; Saurab Dulal; Diwa Koirala

arXiv:2007.13798·cs.CL·July 29, 2020

Linguistic Taboos and Euphemisms in Nepali

Nobal B. Niraula, Saurab Dulal, Diwa Koirala

PDF

1 Repo

TL;DR

This paper provides a comprehensive analysis of linguistic taboos and euphemisms in Nepali, including a new dataset of offensive terms, to aid in offensive language detection and language learning.

Contribution

It introduces a detailed corpus-based study of Nepali offensive language, categorizes taboo words, and presents a manually curated dataset of over 1000 offensive terms.

Findings

01

Identified 18 categories of linguistic offenses

02

Discussed 12 common euphemisms and their usage

03

Created a dataset of 1000+ offensive terms

Abstract

Languages across the world have words, phrases, and behaviors -- the taboos -- that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nowalab/offensive-nepali
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.