Contextual-Lexicon Approach for Abusive Language Detection

Francielle Vargas; Fabiana Rodrigues de G\'oes; Isabelle Carvalho,; Fabr\'icio Benevenuto; Thiago Alexandre Salgueiro Pardo

arXiv:2104.12265·cs.CL·December 22, 2022

Contextual-Lexicon Approach for Abusive Language Detection

Francielle Vargas, Fabiana Rodrigues de G\'oes, Isabelle Carvalho,, Fabr\'icio Benevenuto, Thiago Alexandre Salgueiro Pardo

PDF

Open Access

TL;DR

This paper introduces a lexicon-based method for detecting offensive language on social media, utilizing contextual annotations, and demonstrates its effectiveness in Portuguese, outperforming existing baselines.

Contribution

A novel contextual-lexicon approach for abusive language detection that is adaptable to multiple languages and outperforms current baseline methods in Portuguese.

Findings

01

Outperforms baseline methods in Portuguese

02

Effective use of contextual annotations

03

Applicable to multiple languages

Abstract

Since a lexicon-based approach is more elegant scientifically, explaining the solution components and being easier to generalize to other applications, this paper provides a new approach for offensive language and hate speech detection on social media. Our approach embodies a lexicon of implicit and explicit offensive and swearing expressions annotated with contextual information. Due to the severity of the social media abusive comments in Brazil, and the lack of research in Portuguese, Brazilian Portuguese is the language used to validate the models. Nevertheless, our method may be applied to any other language. The conducted experiments show the effectiveness of the proposed approach, outperforming the current baseline methods for the Portuguese language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Internet Traffic Analysis and Secure E-voting