Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation

Ritsu Sakabe; Hwichan Kim; Tosho Hirasawa; Mamoru Komachi

arXiv:2511.09133·cs.CL·November 17, 2025

Assessing the Capabilities of LLMs in Humor:A Multi-dimensional Analysis of Oogiri Generation and Evaluation

Ritsu Sakabe, Hwichan Kim, Tosho Hirasawa, Mamoru Komachi

PDF

Open Access

TL;DR

This study systematically evaluates Large Language Models' humor capabilities using a multi-dimensional approach based on Japanese Oogiri comedy, revealing strengths in creativity but weaknesses in empathy and human-like humor assessment.

Contribution

Introduces a multi-dimensional evaluation framework for LLMs' humor, expanding datasets, and analyzing their creative and evaluative abilities across six nuanced humor dimensions.

Findings

01

LLMs generate humor responses between low- and mid-tier human performance.

02

LLMs lack empathy in humor, affecting their evaluation accuracy.

03

Humans prioritize empathy, while LLMs focus on novelty in humor assessment.

Abstract

Computational humor is a frontier for creating advanced and engaging natural language processing (NLP) applications, such as sophisticated dialogue systems. While previous studies have benchmarked the humor capabilities of Large Language Models (LLMs), they have often relied on single-dimensional evaluations, such as judging whether something is simply ``funny.'' This paper argues that a multifaceted understanding of humor is necessary and addresses this gap by systematically evaluating LLMs through the lens of Oogiri, a form of Japanese improvisational comedy games. To achieve this, we expanded upon existing Oogiri datasets with data from new sources and then augmented the collection with Oogiri responses generated by LLMs. We then manually annotated this expanded collection with 5-point absolute ratings across six dimensions: Novelty, Clarity, Relevance, Intelligence, Empathy, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHumor Studies and Applications · Language, Metaphor, and Cognition · Multimodal Machine Learning Applications