MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes

Samir Wagle; Reewaj Khanal; Abiral Adhikari

arXiv:2604.14218·cs.CL·April 17, 2026

MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes

Samir Wagle, Reewaj Khanal, Abiral Adhikari

PDF

1 Repo

TL;DR

This paper introduces a multimodal fusion architecture for hate speech and sentiment detection in Nepali memes, demonstrating improved performance and revealing key challenges in low-resource, script-specific contexts.

Contribution

It proposes a hybrid cross-modal attention model combining visual and multilingual text encoders, with insights into model limitations and data scarcity effects.

Findings

01

Explicit cross-modal reasoning improves F1-macro by 5.9% over text-only baselines.

02

English-centric vision models perform poorly on Devanagari script.

03

Ensemble methods can degrade under data scarcity due to overfitting.

Abstract

Hate speech detection in Devanagari-scripted social media memes presents compounded challenges: multimodal content structure, script-specific linguistic complexity, and extreme data scarcity in low-resource settings. This paper presents our system for the CHiPSAL 2026 shared task, addressing both Subtask A (binary hate speech detection) and Subtask B (three-class sentiment classification: positive, neutral, negative). We propose a hybrid cross-modal attention fusion architecture that combines CLIP (ViT-B/32) for visual encoding with BGE-M3 for multilingual text representation, connected through 4-head self-attention and a learnable gating network that dynamically weights modality contributions on a per-sample basis. Systematic evaluation across eight model configurations demonstrates that explicit cross-modal reasoning achieves a 5.9% F1-macro improvement over text-only baselines on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tri-Yantra-Technologies/MEME-Fusion
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.