Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection
Atanu Mandal, Gargi Roy, Amit Barman, Indranil Dutta, Sudip Kumar, Naskar

TL;DR
This paper introduces a Transformer-based multimodal approach with an Attentive Fusion layer to detect hate speech using both audio and text, significantly outperforming previous methods.
Contribution
The novel Attentive Fusion layer effectively combines audio and textual data within a Transformer framework for hate speech detection.
Findings
Achieved a macro F1 score of 0.927 on the test set.
Outperformed previous state-of-the-art techniques.
Demonstrated effectiveness of multimodal analysis in hate speech detection.
Abstract
With the recent surge and exponential growth of social media usage, scrutinizing social media content for the presence of any hateful content is of utmost importance. Researchers have been diligently working since the past decade on distinguishing between content that promotes hatred and content that does not. Traditionally, the main focus has been on analyzing textual content. However, recent research attempts have also commenced into the identification of audio-based content. Nevertheless, studies have shown that relying solely on audio or text-based content may be ineffective, as recent upsurge indicates that individuals often employ sarcasm in their speech and writing. To overcome these challenges, we present an approach to identify whether a speech promotes hate or not utilizing both audio and textual representations. Our methodology is based on the Transformer framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Layer Normalization · Adam · Residual Connection · Dropout · Linear Layer · Multi-Head Attention · Byte Pair Encoding
