Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine   Tuned Multilingual Embeddings

Arkadipta De; Venkatesh E; Kaushal Kumar Maurya; Maunendra Sankar; Desarkar

arXiv:2101.04998·cs.CL·January 14, 2021·1 cites

Coarse and Fine-Grained Hostility Detection in Hindi Posts using Fine Tuned Multilingual Embeddings

Arkadipta De, Venkatesh E, Kaushal Kumar Maurya, Maunendra Sankar, Desarkar

PDF

Open Access 1 Repo

TL;DR

This paper presents a neural network approach using multilingual BERT to detect various types of hostility in Hindi social media posts, achieving state-of-the-art results despite resource constraints.

Contribution

It introduces a multi-label classification framework for hostility detection in Hindi using fine-tuned multilingual embeddings, outperforming existing baselines.

Findings

01

Achieved high F1 scores for multiple hostility categories.

02

Outperformed baseline models with a novel One-vs-the-Rest approach.

03

Established a new state-of-the-art for Hindi hostility detection.

Abstract

Due to the wide adoption of social media platforms like Facebook, Twitter, etc., there is an emerging need of detecting online posts that can go against the community acceptance standards. The hostility detection task has been well explored for resource-rich languages like English, but is unexplored for resource-constrained languages like Hindidue to the unavailability of large suitable data. We view this hostility detection as a multi-label multi-class classification problem. We propose an effective neural network-based technique for hostility detection in Hindi posts. We leverage pre-trained multilingual Bidirectional Encoder Representations of Transformer (mBERT) to obtain the contextual representations of Hindi posts. We have performed extensive experiments including different pre-processing techniques, pre-trained models, neural architectures, hybrid strategies, etc. Our best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Arko98/Hostility-Detection-in-Hindi-Constraint-2021
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Advanced Malware Detection Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Attention Is All You Need · Byte Pair Encoding · Multi-Head Attention · Dropout · Layer Normalization