LUC at ComMA-2021 Shared Task: Multilingual Gender Biased and Communal   Language Identification without using linguistic features

Rodrigo Cu\'ellar-Hidalgo; Julio de Jes\'us Guerrero-Zambrano; Dominic; Forest; Gerardo Reyes-Salgado; Juan-Manuel Torres-Moreno

arXiv:2112.10189·cs.CL·December 21, 2021

LUC at ComMA-2021 Shared Task: Multilingual Gender Biased and Communal Language Identification without using linguistic features

Rodrigo Cu\'ellar-Hidalgo, Julio de Jes\'us Guerrero-Zambrano, Dominic, Forest, Gerardo Reyes-Salgado, Juan-Manuel Torres-Moreno

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of probabilistic and vector space modeling methods in classifying social media documents as aggressive, gender biased, or communal, without relying on linguistic features, demonstrating competitive performance in a multilingual shared task.

Contribution

It introduces a feature-agnostic approach using VSM and probabilistic methods for social language classification, tested in a multilingual shared task setting.

Findings

01

Effective classification of social network documents into targeted categories.

02

Competitive results achieved in the ComMA@ICON'21 shared task.

03

Identifies relevant configurations for language and bias detection.

Abstract

This work aims to evaluate the ability that both probabilistic and state-of-the-art vector space modeling (VSM) methods provide to well known machine learning algorithms to identify social network documents to be classified as aggressive, gender biased or communally charged. To this end, an exploratory stage was performed first in order to find relevant settings to test, i.e. by using training and development samples, we trained multiple algorithms using multiple vector space modeling and probabilistic methods and discarded the less informative configurations. These systems were submitted to the competition of the ComMA@ICON'21 Workshop on Multilingual Gender Biased and Communal Language Identification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Text Readability and Simplification