Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Yousra Fettach; Guillaume Bied; Hannu Toivonen; Tijl De Bie

arXiv:2604.08757·cs.CL·April 13, 2026

Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models

Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie

PDF

TL;DR

This paper benchmarks humor alignment in large language models by having them play Cards Against Humanity, revealing modest human preference alignment and significant model-model agreement influenced by biases.

Contribution

It introduces a novel benchmark for humor alignment in LLMs using CAH gameplay and analyzes the factors influencing model preferences.

Findings

01

Models outperform random baseline in humor selection

02

Models agree more with each other than with humans

03

Biases and content preferences influence humor judgments

Abstract

Humor is one of the most culturally embedded and socially significant dimensions of human communication, yet it remains largely unexplored as a dimension of Large Language Model (LLM) alignment. In this study, five frontier language models play the same Cards Against Humanity games (CAH) as human players. The models select the funniest response from a slate of ten candidate cards across 9,894 rounds. While all models exceed the random baseline, alignment with human preference remains modest. More striking is that models agree with each other substantially more often than they agree with humans. We show that this preference is partly explained by systematic position biases and content preferences, raising the question whether LLM humor judgment reflects genuine preference or structural artifacts of inference and alignment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.