An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Sandip Modha, Prasenjit Majumder

TL;DR
This study compares various text representation schemes for detecting user aggression and fact verification on multilingual social media data, finding that word embeddings like fastText outperform traditional methods in certain contexts.
Contribution
It provides an empirical comparison of multiple text representation techniques, including BoW, word embeddings, and transfer learning models, on multilingual social media tasks.
Findings
BoW outperforms word embeddings on machine learning classifiers.
Pre-trained word embeddings like fastText yield the best weighted F1-score.
Deep neural models are more robust on lexically different datasets.
Abstract
This paper attempt to study the effectiveness of text representation schemes on two tasks namely: User Aggression and Fact Detection from the social media contents. In User Aggression detection, The aim is to identify the level of aggression from the contents generated in the Social media and written in the English, Devanagari Hindi and Romanized Hindi. Aggression levels are categorized into three predefined classes namely: `Non-aggressive`, `Overtly Aggressive`, and `Covertly Aggressive`. During the disaster-related incident, Social media like, Twitter is flooded with millions of posts. In such emergency situations, identification of factual posts is important for organizations involved in the relief operation. We anticipated this problem as a combination of classification and Ranking problem. This paper presents a comparison of various text representation scheme based on BoW…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Sentiment Analysis and Opinion Mining
MethodsDropout · GloVe Embeddings · Skip-gram Word2Vec · Adam · Sigmoid Activation · Tanh Activation · Temporal Activation Regularization · DropConnect · Long Short-Term Memory · Activation Regularization
