Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

Raihan Tanvir; Md. Golam Rabiul Alam

arXiv:2602.19212·cs.CL·February 24, 2026

Retrieval Augmented Enhanced Dual Co-Attention Framework for Target Aware Multimodal Bengali Hateful Meme Detection

Raihan Tanvir, Md. Golam Rabiul Alam

PDF

Open Access

TL;DR

This paper introduces a retrieval-augmented dual co-attention framework for detecting hate speech in Bengali multimodal memes, addressing low-resource challenges with dataset augmentation, cross-modal learning, and non-parametric inference.

Contribution

It proposes the xDORA framework combining vision and multilingual text encoders with retrieval-based reasoning, improving hate meme detection in Bengali.

Findings

01

xDORA achieves macro F1-scores of 0.78 and 0.71 for hate and target detection.

02

RAG-Fused DORA improves performance to 0.79 and 0.74, surpassing the baseline.

03

FAISS-based classifier demonstrates robustness for rare classes.

Abstract

Hateful content on social media increasingly appears as multimodal memes that combine images and text to convey harmful narratives. In low-resource languages such as Bengali, automated detection remains challenging due to limited annotated data, class imbalance, and pervasive code-mixing. To address these issues, we augment the Bengali Hateful Memes (BHM) dataset with semantically aligned samples from the Multimodal Aggression Dataset in Bengali (MIMOSA), improving both class balance and semantic diversity. We propose the Enhanced Dual Co-attention Framework (xDORA), integrating vision encoders (CLIP, DINOv2) and multilingual text encoders (XGLM, XLM-R) via weighted attention pooling to learn robust cross-modal representations. Building on these embeddings, we develop a FAISS-based k-nearest neighbor classifier for non-parametric inference and introduce RAG-Fused DORA, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Misinformation and Its Impacts