Align before Attend: Aligning Visual and Textual Features for Multimodal   Hateful Content Detection

Eftekhar Hossain; Omar Sharif; Mohammed Moshiul Hoque; Sarah M. Preum

arXiv:2402.09738·cs.CL·February 16, 2024·2 cites

Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection

Eftekhar Hossain, Omar Sharif, Mohammed Moshiul Hoque, Sarah M. Preum

PDF

Open Access 1 Repo

TL;DR

This paper introduces a context-aware attention framework that aligns visual and textual features for improved multimodal hateful content detection across English and low-resource languages, demonstrating significant performance gains.

Contribution

It proposes a novel attention-based alignment method for multimodal features, effective for both English and low-resource languages, advancing hateful content detection.

Findings

01

Achieved F1-scores of 69.7% on Bengali code-mixed data and 70.3% on English data.

02

Improved performance by approximately 2.5% and 3.2% over state-of-the-art methods.

03

Validated effectiveness across multilingual datasets.

Abstract

Multimodal hateful content detection is a challenging task that requires complex reasoning across visual and textual modalities. Therefore, creating a meaningful multimodal representation that effectively captures the interplay between visual and textual features through intermediate fusion is critical. Conventional fusion techniques are unable to attend to the modality-specific features effectively. Moreover, most studies exclusively concentrated on English and overlooked other low-resource languages. This paper proposes a context-aware attention framework for multimodal hateful content detection and assesses it for both English and non-English languages. The proposed approach incorporates an attention layer to meaningfully align the visual and textual features. This alignment enables selective focus on modality-specific features before fusing them. We evaluate the proposed approach on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eftekhar-hossain/bengali-hateful-memes
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts

MethodsFocus · ALIGN