The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse
Ritesh Kumar, Enakshi Nandi, Laishram Niranjana Devi, Shyam, Ratan, Siddharth Singh, Akash Bhagat, Yogesh Dawer

TL;DR
This paper introduces the ComMA V0.2 dataset, a multilingual, annotated social media comment dataset with detailed tags for aggression, bias, and discourse roles, enabling advanced analysis and automatic detection of harmful content.
Contribution
It presents a new multilingual, fine-grained annotation scheme and dataset for aggression and bias in social media comments, including development and baseline experiments.
Findings
Dataset includes 15,000 comments in four languages.
Annotated with hierarchical tags for aggression, bias, and discourse roles.
Baseline models show promising results in automatic aggression detection.
Abstract
In this paper, we discuss the development of a multilingual dataset annotated with a hierarchical, fine-grained tagset marking different types of aggression and the "context" in which they occur. The context, here, is defined by the conversational thread in which a specific comment occurs and also the "type" of discursive role that the comment is performing with respect to the previous comment. The initial dataset, being discussed here (and made available as part of the ComMA@ICON shared task), consists of a total 15,000 annotated comments in four languages - Meitei, Bangla, Hindi, and Indian English - collected from various social media platforms such as YouTube, Facebook, Twitter and Telegram. As is usual on social media websites, a large number of these comments are multilingual, mostly code-mixed with English. The paper gives a detailed description of the tagset being used for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression
