The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual   Social Media Discourse

Ritesh Kumar; Enakshi Nandi; Laishram Niranjana Devi; Shyam; Ratan; Siddharth Singh; Akash Bhagat; Yogesh Dawer

arXiv:2111.10390·cs.CL·November 23, 2021·1 cites

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

Ritesh Kumar, Enakshi Nandi, Laishram Niranjana Devi, Shyam, Ratan, Siddharth Singh, Akash Bhagat, Yogesh Dawer

PDF

Open Access

TL;DR

This paper introduces the ComMA V0.2 dataset, a multilingual, annotated social media comment dataset with detailed tags for aggression, bias, and discourse roles, enabling advanced analysis and automatic detection of harmful content.

Contribution

It presents a new multilingual, fine-grained annotation scheme and dataset for aggression and bias in social media comments, including development and baseline experiments.

Findings

01

Dataset includes 15,000 comments in four languages.

02

Annotated with hierarchical tags for aggression, bias, and discourse roles.

03

Baseline models show promising results in automatic aggression detection.

Abstract

In this paper, we discuss the development of a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur. The context, here, is defined by the conversational thread in which a specific comment occurs and also the "type" of discursive role that the comment is performing with respect to the previous comment. The initial dataset, being discussed here (and made available as part of the ComMA@ICON shared task), consists of a total 15,000 annotated comments in four languages - Meitei, Bangla, Hindi, and Indian English - collected from various social media platforms such as YouTube, Facebook, Twitter and Telegram. As is usual on social media websites, a large number of these comments are multilingual, mostly code-mixed with English. The paper gives a detailed description of the tagset being used for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression