MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

Aaron Scott; Maike Z\"ufle; Jan Niehues

arXiv:2510.24178·cs.CL·March 5, 2026

MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

Aaron Scott, Maike Z\"ufle, Jan Niehues

PDF

1 Datasets

TL;DR

MuSaG is a novel German multimodal sarcasm dataset with aligned text, audio, and video annotations, designed to evaluate and improve sarcasm detection models across modalities in realistic social media scenarios.

Contribution

This paper introduces MuSaG, the first German multimodal sarcasm dataset with full-modal annotations, and benchmarks various models to identify gaps in current multimodal sarcasm detection capabilities.

Findings

01

Humans rely heavily on audio cues for sarcasm detection.

02

Models perform best on text modality, indicating a gap in multimodal understanding.

03

MuSaG enables evaluation of multimodal sarcasm detection in realistic settings.

Abstract

Sarcasm is a complex form of figurative language in which the intended meaning contradicts the literal one. Its prevalence in social media and popular culture poses persistent challenges for natural language understanding, sentiment analysis, and content moderation. With the emergence of multimodal large language models, sarcasm detection extends beyond text and requires integrating cues from audio and vision. We present MuSaG, the first German multimodal sarcasm detection dataset, consisting of 33 minutes of manually selected and human-annotated statements from German television shows. Each instance provides aligned text, audio, and video modalities, annotated separately by humans, enabling evaluation in unimodal and multimodal settings. We benchmark nine open-source and commercial models, spanning text, audio, vision, and multimodal architectures, and compare their performance to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sc0ttypee/MuSaG
dataset· 8 dl
8 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.