Automatic verbal aggression detection for Russian and American   imageboards

Denis Gordeev

arXiv:1604.06648·cs.CL·April 25, 2016·2 cites

Automatic verbal aggression detection for Russian and American imageboards

Denis Gordeev

PDF

Open Access

TL;DR

This paper explores automatic detection of verbal aggression on American and Russian imageboards using machine learning, achieving high accuracy for English but still improving for Russian.

Contribution

It introduces a method using word2vec for aggression detection in Russian and American imageboards, with a large dataset of messages.

Findings

01

88% accuracy for English aggression detection

02

Large dataset of 1.8 million messages used

03

Results for Russian need further improvement

Abstract

The problem of aggression for Internet communities is rampant. Anonymous forums usually called imageboards are notorious for their aggressive and deviant behaviour even in comparison with other Internet communities. This study is aimed at studying ways of automatic detection of verbal expression of aggression for the most popular American (4chan.org) and Russian (2ch.hk) imageboards. A set of 1,802,789 messages was used for this study. The machine learning algorithm word2vec was applied to detect the state of aggression. A decent result is obtained for English (88%), the results for Russian are yet to be improved.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Hate Speech and Cyberbullying Detection · Authorship Attribution and Profiling