Aggression-annotated Corpus of Hindi-English Code-mixed Data
Ritesh Kumar, Aishwarya N. Reganti, Akshit Bhatia, Tushar Maheshwari

TL;DR
This paper introduces an annotated corpus of Hindi-English code-mixed social media data with aggression labels, aiming to facilitate research on detecting online aggression and hate speech in multilingual contexts.
Contribution
It presents the development of an aggression annotation scheme and a large, publicly available code-mixed dataset from Twitter and Facebook for NLP research.
Findings
Annotated corpus with 39,000 social media comments and posts.
Hierarchical aggression tagset with 3 top-level and 10 sub-tags.
Dataset released for future research in aggression detection.
Abstract
As the interaction over the web has increased, incidents of aggression and related events like trolling, cyberbullying, flaming, hate speech, etc. too have increased manifold across the globe. While most of these behaviour like bullying or hate speech have predated the Internet, the reach and extent of the Internet has given these an unprecedented power and influence to affect the lives of billions of people. So it is of utmost significance and importance that some preventive measures be taken to provide safeguard to the people using the web such that the web remains a viable medium of communication and connection, in general. In this paper, we discuss the development of an aggression tagset and an annotated corpus of Hindi-English code-mixed data from two of the most popular social networking and social media platforms in India, Twitter and Facebook. The corpus is annotated using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
