Labels or Input? Rethinking Augmentation in Multimodal Hate Detection

Sahajpreet Singh; Kokil Jaidka; Subhayan Mukerjee

arXiv:2508.11808·cs.CV·January 27, 2026

Labels or Input? Rethinking Augmentation in Multimodal Hate Detection

Sahajpreet Singh, Kokil Jaidka, Subhayan Mukerjee

PDF

Open Access 1 Datasets

TL;DR

This paper explores how prompt optimization, fine-tuning, and automated data augmentation can significantly improve small multimodal hate detection models, making them more robust and deployable without relying on large, costly models.

Contribution

It introduces an end-to-end pipeline for prompt design and a multimodal augmentation framework that enhances small models' ability to detect implicit hate in memes.

Findings

01

Structured prompts and scaled supervision improve small model performance.

02

Counterfactually neutral meme generation reduces spurious correlations.

03

Prompt design and targeted augmentation narrow the gap between small and large models.

Abstract

Online hate remains a significant societal challenge, especially as multimodal content enables subtle, culturally grounded, and implicit forms of harm. Hateful memes embed hostility through text-image interactions and humor, making them difficult for automated systems to interpret. Although recent Vision-Language Models (VLMs) perform well on explicit cases, their deployment is limited by high inference costs and persistent failures on nuanced content. This work examines how far small models can be improved through prompt optimization, fine-tuning, and automated data augmentation. We introduce an end-to-end pipeline that varies prompt structure, label granularity, and training modality, showing that structured prompts and scaled supervision significantly strengthen compact VLMs. We also develop a multimodal augmentation framework that generates counterfactually neutral memes via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sahajps/Meme-Sanity
dataset· 184 dl
184 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection