Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large   Language Models

Zachary Horvitz; Jingru Chen; Rahul Aditya; Harshvardhan Srivastava,; Robert West; Zhou Yu; Kathleen McKeown

arXiv:2403.00794·cs.CL·June 24, 2024·1 cites

Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models

Zachary Horvitz, Jingru Chen, Rahul Aditya, Harshvardhan Srivastava,, Robert West, Zhou Yu, Kathleen McKeown

PDF

Open Access 1 Repo

TL;DR

This paper explores using large language models to generate synthetic humor data, demonstrating their ability to 'unfun' jokes and create challenging datasets for humor detection in multiple languages.

Contribution

It introduces a novel approach of leveraging LLMs to produce synthetic humor datasets, aiding humor detection research and addressing data scarcity issues.

Findings

01

LLMs can effectively 'unfun' jokes as rated by humans.

02

Synthetic data from GPT-4 is highly rated and challenging for humor classifiers.

03

The approach extends successfully to code-mixed English-Hindi humor datasets.

Abstract

Humor is a fundamental facet of human cognition and interaction. Yet, despite recent advances in natural language processing, humor detection remains a challenging task that is complicated by the scarcity of datasets that pair humorous texts with similar non-humorous counterparts. In our work, we investigate whether large language models (LLMs), can generate synthetic data for humor detection via editing texts. We benchmark LLMs on an existing human dataset and show that current LLMs display an impressive ability to 'unfun' jokes, as judged by humans and as measured on the downstream task of humor detection. We extend our approach to a code-mixed English-Hindi humor dataset, where we find that GPT-4's synthetic data is highly rated by bilingual annotators and provides challenging adversarial examples for humor classifiers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zacharyhorvitz/getting-serious-with-llms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHumor Studies and Applications · Comics and Graphic Narratives · Video Analysis and Summarization