Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models
Yuchen Su, Shaoxin Zhong, Yonghua Zhu, Ruofan Wang, Zijian Huang, Qiqi Wang, Na Zhao, Diana Benavides-Prado, Michael Witbrock

TL;DR
This paper introduces APUN-Bench, a comprehensive benchmark for assessing large audio-language models on their ability to understand spoken puns, highlighting significant performance gaps and challenges in this underexplored modality.
Contribution
It presents the first dedicated benchmark for audio pun understanding, providing a dataset and systematic evaluation of state-of-the-art models in this domain.
Findings
Significant performance gaps in pun recognition, localization, and interpretation.
Identification of positional biases affecting pun localization.
Analysis of error cases in pun meaning inference.
Abstract
Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHumor Studies and Applications · Multisensory perception and integration · Language, Metaphor, and Cognition
