MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially   Euphemistic Terms

Patrick Lee; Alain Chirino Trujillo; Diana Cuevas Plancarte; Olumide; Ebenezer Ojo; Xinyi Liu; Iyanuoluwa Shode; Yuan Zhao; Jing Peng; Anna Feldman

arXiv:2401.14526·cs.CL·January 29, 2024·1 cites

MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, Olumide, Ebenezer Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman

PDF

Open Access

TL;DR

This paper explores the use of multilingual transformer models to disambiguate potentially euphemistic terms across languages, demonstrating zero-shot learning and cross-lingual transfer benefits in euphemism processing.

Contribution

It introduces a multilingual transformer-based approach for euphemism disambiguation and analyzes cross-lingual transfer and domain-specific effects in euphemism understanding.

Findings

01

Multilingual models outperform monolingual models significantly.

02

Zero-shot cross-lingual learning is effective for euphemism disambiguation.

03

Cross-lingual data within the same domain enhances model performance.

Abstract

This study investigates the computational processing of euphemisms, a universal linguistic phenomenon, across multiple languages. We train a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic "categories" such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSwearing, Euphemism, Multilingualism · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling

MethodsFocus