Engagement Undermines Safety: How Stereotypes and Toxicity Shape Humor in Language Models
Atharvan Dogra, Soumya Suvra Ghosal, Ameet Deshpande, Ashwin Kalyan, Dinesh Manocha

TL;DR
This study investigates how humor generation in large language models can inadvertently reinforce stereotypes and toxicity, revealing biases and structural issues in humor and safety alignment.
Contribution
It introduces a comprehensive evaluation framework linking humor, stereotypes, and toxicity in LLMs, highlighting bias amplification and structural embedding of harmful content.
Findings
Harmful outputs receive higher humor scores, especially with role-based prompts.
Harmful cues increase predictive uncertainty and can make harmful punchlines more expected.
Satire generation in LLMs increases stereotypicality and toxicity, affecting human perceptions.
Abstract
Large language models are increasingly used for creative writing and engagement content, raising safety concerns about the outputs. Therefore, casting humor generation as a testbed, this work evaluates how funniness optimization in modern LLM pipelines couples with harmful content by jointly measuring humor, stereotypicality, and toxicity. This is further supplemented by analyzing incongruity signals through information-theoretic metrics. Across six models, we observe that harmful outputs receive higher humor scores which further increase under role-based prompting, indicating a bias amplification loop between generators and evaluators. Information-theoretic analyses show harmful cues widen predictive uncertainty and surprisingly, can even make harmful punchlines more expected for some models, suggesting structural embedding in learned humor distributions. External validation on an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHumor Studies and Applications · Language, Metaphor, and Cognition · Psychology of Moral and Emotional Judgment
