Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection
Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

TL;DR
This paper introduces a context-aware attention framework that aligns visual and textual features for improved multimodal hateful content detection across English and low-resource languages, demonstrating significant performance gains.
Contribution
It proposes a novel attention-based alignment method for multimodal features, effective for both English and low-resource languages, advancing hateful content detection.
Findings
Achieved F1-scores of 69.7% on Bengali code-mixed data and 70.3% on English data.
Improved performance by approximately 2.5% and 3.2% over state-of-the-art methods.
Validated effectiveness across multilingual datasets.
Abstract
Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities. Therefore, creating a meaningful multimodal representation that effectively captures the interplay between visual and textual features through intermediate fusion is critical. Conventional fusion techniques are unable to attend to the modality-specific features effectively. Moreover, most studies exclusively concentrated on English and overlooked other low-resource languages. This paper proposes a context-aware attention framework for multimodal hateful content detection and assesses it for both English and non-English languages. The proposed approach incorporates an attention layer to meaningfully align the visual and textual features. This alignment enables selective focus on modality-specific features before fusing them. We evaluate the proposed approach on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts
MethodsFocus · ALIGN
