HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation
Jiajun Zhang, Shijia Luo, Ruikang Zhang, Qi Su

TL;DR
HUMORCHAIN introduces a theory-guided multi-stage reasoning framework for interpretable multimodal humor generation, integrating visual understanding with humor theories to produce more human-like, cognitively deep humorous image descriptions.
Contribution
This work is the first to embed cognitive humor theories into a structured reasoning process for multimodal humor generation, enhancing interpretability and alignment with human perception.
Findings
Outperforms state-of-the-art baselines in humor preference and scores
Demonstrates improved semantic diversity in generated humor
Validates the effectiveness of theory-guided reasoning in humor generation
Abstract
Humor, as both a creative human activity and a social binding mechanism, has long posed a major challenge for AI generation. Although producing humor requires complex cognitive reasoning and social understanding, theories of humor suggest that it follows learnable patterns and structures, making it theoretically possible for generative models to acquire them implicitly. In recent years, multimodal humor has become a prevalent form of online communication, especially among Gen Z, highlighting the need for AI systems capable of integrating visual understanding with humorous language generation. However, existing data-driven approaches lack explicit modeling or theoretical grounding of humor, often producing literal descriptions that fail to capture its underlying cognitive mechanisms, resulting in the generated image descriptions that are fluent but lack genuine humor or cognitive depth.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHumor Studies and Applications · Multimodal Machine Learning Applications · Language, Metaphor, and Cognition
