HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation

Jiajun Zhang; Shijia Luo; Ruikang Zhang; Qi Su

arXiv:2511.21732·cs.CL·March 25, 2026

HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation

Jiajun Zhang, Shijia Luo, Ruikang Zhang, Qi Su

PDF

Open Access

TL;DR

HUMORCHAIN introduces a theory-guided multi-stage reasoning framework for interpretable multimodal humor generation, integrating visual understanding with humor theories to produce more human-like, cognitively deep humorous image descriptions.

Contribution

This work is the first to embed cognitive humor theories into a structured reasoning process for multimodal humor generation, enhancing interpretability and alignment with human perception.

Findings

01

Outperforms state-of-the-art baselines in humor preference and scores

02

Demonstrates improved semantic diversity in generated humor

03

Validates the effectiveness of theory-guided reasoning in humor generation

Abstract

Humor, as both a creative human activity and a social binding mechanism, has long posed a major challenge for AI generation. Although producing humor requires complex cognitive reasoning and social understanding, theories of humor suggest that it follows learnable patterns and structures, making it theoretically possible for generative models to acquire them implicitly. In recent years, multimodal humor has become a prevalent form of online communication, especially among Gen Z, highlighting the need for AI systems capable of integrating visual understanding with humorous language generation. However, existing data-driven approaches lack explicit modeling or theoretical grounding of humor, often producing literal descriptions that fail to capture its underlying cognitive mechanisms, resulting in the generated image descriptions that are fluent but lack genuine humor or cognitive depth.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHumor Studies and Applications · Multimodal Machine Learning Applications · Language, Metaphor, and Cognition