Not All Jokes Land: Evaluating Large Language Models Understanding of Workplace Humor
Mohammadamin Shafiei, Hamidreza Saffari

TL;DR
This paper investigates how well large language models understand workplace humor by creating a dataset and evaluating five models, revealing their struggles in judging humor appropriateness.
Contribution
The study introduces a new dataset of professional workplace humor and assesses LLMs' ability to evaluate humor appropriateness, highlighting gaps in current models.
Findings
LLMs often misjudge humor appropriateness in workplace contexts.
Current LLMs struggle with understanding professional humor.
The dataset enables better evaluation of humor understanding in AI.
Abstract
With the recent advances in Artificial Intelligence (AI) and Large Language Models (LLMs), the automation of daily tasks, like automatic writing, is getting more and more attention. Hence, efforts have focused on aligning LLMs with human values, yet humor, particularly professional industrial humor used in workplaces, has been largely neglected. To address this, we develop a dataset of professional humor statements along with features that determine the appropriateness of each statement. Our evaluation of five LLMs shows that LLMs often struggle to judge the appropriateness of humor accurately.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHumor Studies and Applications · Language, Metaphor, and Cognition · Language, Communication, and Linguistic Studies
