Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand   Humor

Ashwin Baluja

arXiv:2412.05315·cs.CL·December 10, 2024

Text Is Not All You Need: Multimodal Prompting Helps LLMs Understand Humor

Ashwin Baluja

PDF

Open Access

TL;DR

This paper demonstrates that multimodal prompting, incorporating both text and spoken cues, enhances large language models' ability to understand and explain humor, which is inherently multimodal in nature.

Contribution

The study introduces a simple multimodal prompting method using speech cues to improve humor understanding in LLMs, surpassing text-only approaches.

Findings

01

Multimodal prompts improve humor explanation accuracy.

02

Speech cues enhance LLM performance across datasets.

03

Multimodal approach outperforms text-only methods.

Abstract

While Large Language Models (LLMs) have demonstrated impressive natural language understanding capabilities across various text-based tasks, understanding humor has remained a persistent challenge. Humor is frequently multimodal, relying on phonetic ambiguity, rhythm and timing to convey meaning. In this study, we explore a simple multimodal prompting approach to humor understanding and explanation. We present an LLM with both the text and the spoken form of a joke, generated using an off-the-shelf text-to-speech (TTS) system. Using multimodal cues improves the explanations of humor compared to textual prompts across all tested datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Humor Studies and Applications · American Literature and Humor Studies