# "Humor, Art, or Misinformation?": A Multimodal Dataset for Intent-Aware Synthetic Image Detection

**Authors:** Anastasios Skoularikis, Stefanos-Iordanis Papadopoulos, Symeon Papadopoulos, Panagiotis C. Petrantonakis

arXiv: 2508.20670 · 2025-09-10

## TL;DR

This paper introduces S-HArM, a multimodal dataset for classifying the intent behind AI-generated images, and evaluates various models and strategies to improve intent detection in real-world content.

## Contribution

It presents a new dataset for intent-aware classification of synthetic images and analyzes multiple modeling approaches to address the challenge of understanding intent.

## Key findings

- Models trained on multimodal data generalize better to real-world content.
- Overall performance remains limited, indicating the complexity of inferring intent.
- Visual context preservation improves model generalization.

## Abstract

Recent advances in multimodal AI have enabled progress in detecting synthetic and out-of-context content. However, existing efforts largely overlook the intent behind AI-generated images. To fill this gap, we introduce S-HArM, a multimodal dataset for intent-aware classification, comprising 9,576 "in the wild" image-text pairs from Twitter/X and Reddit, labeled as Humor/Satire, Art, or Misinformation. Additionally, we explore three prompting strategies (image-guided, description-guided, and multimodally-guided) to construct a large-scale synthetic training dataset with Stable Diffusion. We conduct an extensive comparative study including modality fusion, contrastive learning, reconstruction networks, attention mechanisms, and large vision-language models. Our results show that models trained on image- and multimodally-guided data generalize better to "in the wild" content, due to preserved visual context. However, overall performance remains limited, highlighting the complexity of inferring intent and the need for specialized architectures.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20670/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20670/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/2508.20670/full.md

---
Source: https://tomesphere.com/paper/2508.20670