Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts

Run Xu; Lu Li; Rongzhao Zhang; Jie Xu

arXiv:2604.18091·cs.CL·April 21, 2026

Culture-Aware Humorous Captioning: Multimodal Humor Generation across Cultural Contexts

Run Xu, Lu Li, Rongzhao Zhang, Jie Xu

PDF

TL;DR

This paper introduces a new multimodal task for generating culturally-aware humorous image captions, addressing the challenge of maintaining relevance, appropriateness, and humor across different cultural contexts.

Contribution

It proposes a staged alignment framework with preference alignment and cultural adaptation, improving humor quality and contextual fit in cross-cultural captioning.

Findings

01

Model achieves better performance in contextual fit and humor quality.

02

Large gains in cultural relevance and balancing image relevance with humor.

03

Framework effectively adapts to Eastern cultural contexts with minimal supervision.

Abstract

Recent multimodal large language models have shown promising ability in generating humorous captions for images, yet they still lack stable control over explicit cultural context, making it difficult to jointly maintain image relevance, contextual appropriateness, and humor quality under a specified cultural background. To address this limitation, we introduce a new multimodal generation task, culture-aware humorous captioning, which requires a model to generate a humorous caption conditioned on both an input image and a target cultural context. Captions generated under different cultural contexts are not expected to share the same surface form, but should remain grounded in similar visual situations or humorous rationales.To support this task, we establish a six-dimensional evaluation framework covering image relevance, contextual fit, semantic richness, reasonableness, humor, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.