Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection

Md. Mithun Hossain; Md. Shakil Hossain; Sudipto Chaki; M. F. Mridha

arXiv:2505.19010·cs.CV·July 31, 2025

Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection

Md. Mithun Hossain, Md. Shakil Hossain, Sudipto Chaki, M. F. Mridha

PDF

TL;DR

This paper introduces Co-AttenDWG, a novel multi-modal fusion method combining co-attention, dimension-wise gating, and expert fusion to improve offensive content detection by enhancing cross-modal interactions.

Contribution

It proposes a new fusion framework that integrates co-attention, gating, and expert fusion for better multi-modal feature interaction and alignment.

Findings

01

Achieves state-of-the-art results on MIMIC and SemEval Memotion 1.0 datasets.

02

Demonstrates improved cross-modal alignment and robustness.

03

Outperforms existing multi-modal fusion methods.

Abstract

Multi-modal learning has emerged as a crucial research direction, as integrating textual and visual information can substantially enhance performance in tasks such as classification, retrieval, and scene understanding. Despite advances with large pre-trained models, existing approaches often suffer from insufficient cross-modal interactions and rigid fusion strategies, failing to fully harness the complementary strengths of different modalities. To address these limitations, we propose Co-AttenDWG, co-attention with dimension-wise gating, and expert fusion. Our approach first projects textual and visual features into a shared embedding space, where a dedicated co-attention mechanism enables simultaneous, fine-grained interactions between modalities. This is further strengthened by a dimension-wise gating network, which adaptively modulates feature contributions at the channel level to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.