Prediction Uncertainty Estimation for Hate Speech Classification
Kristian Miok, Dong Nguyen-Doan, Bla\v{z} \v{S}krlj, Daniela Zaharie, and Marko Robnik-\v{S}ikonja

TL;DR
This paper introduces a method for hate speech detection that estimates prediction uncertainty using Monte Carlo dropout, enhancing model reliability and interpretability in classifying harmful content.
Contribution
It adapts deep neural networks with uncertainty estimation for hate speech detection, providing explanations and reliability assessments.
Findings
Effective uncertainty estimation improves detection reliability.
Visualization techniques aid understanding of classification confidence.
Method performs well across different text embeddings.
Abstract
As a result of social network popularity, in recent years, hate speech phenomenon has significantly increased. Due to its harmful effect on minority groups as well as on large communities, there is a pressing need for hate speech detection and filtering. However, automatic approaches shall not jeopardize free speech, so they shall accompany their decisions with explanations and assessment of uncertainty. Thus, there is a need for predictive machine learning models that not only detect hate speech but also help users understand when texts cross the line and become unacceptable. The reliability of predictions is usually not addressed in text classification. We fill this gap by proposing the adaptation of deep neural networks that can efficiently estimate prediction uncertainty. To reliably detect hate speech, we use Monte Carlo dropout regularization, which mimics Bayesian inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMonte Carlo Dropout · Dropout
