Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
Md. Mithun Hossain, Md. Shakil Hossain, Sudipto Chaki, M. F. Mridha

TL;DR
This paper introduces Co-AttenDWG, a novel multi-modal fusion method combining co-attention, dimension-wise gating, and expert fusion to improve offensive content detection by enhancing cross-modal interactions.
Contribution
It proposes a new fusion framework that integrates co-attention, gating, and expert fusion for better multi-modal feature interaction and alignment.
Findings
Achieves state-of-the-art results on MIMIC and SemEval Memotion 1.0 datasets.
Demonstrates improved cross-modal alignment and robustness.
Outperforms existing multi-modal fusion methods.
Abstract
Multi-modal learning has emerged as a crucial research direction, as integrating textual and visual information can substantially enhance performance in tasks such as classification, retrieval, and scene understanding. Despite advances with large pre-trained models, existing approaches often suffer from insufficient cross-modal interactions and rigid fusion strategies, failing to fully harness the complementary strengths of different modalities. To address these limitations, we propose Co-AttenDWG, co-attention with dimension-wise gating, and expert fusion. Our approach first projects textual and visual features into a shared embedding space, where a dedicated co-attention mechanism enables simultaneous, fine-grained interactions between modalities. This is further strengthened by a dimension-wise gating network, which adaptively modulates feature contributions at the channel level to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
