PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Kun Ouyang; Yuanxin Liu; Shicheng Li; Yi Liu; Hao Zhou; Fandong Meng; Jie Zhou; Xu Sun

arXiv:2412.11906·cs.CV·June 18, 2025

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Kun Ouyang, Yuanxin Liu, Shicheng Li, Yi Liu, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun

PDF

Open Access 1 Video

TL;DR

This paper introduces PunchBench, a comprehensive benchmark for evaluating multimodal large language models' ability to understand humor and sarcasm in image-caption pairs, addressing existing limitations and proposing a new improvement strategy.

Contribution

It presents PunchBench, a novel benchmark with diverse questions and domain coverage, and introduces SC-CoQ, a strategy to improve punchline comprehension in MLLMs.

Findings

01

Significant gap between MLLMs and humans in punchline comprehension.

02

SC-CoQ strategy improves MLLMs' performance on PunchBench.

03

Enhanced evaluation accuracy by generating synonymous and antonymous captions.

Abstract

Multimodal punchlines, which involve humor or sarcasm conveyed in image-caption pairs, are a popular way of communication on online multimedia platforms. With the rapid development of multimodal large language models (MLLMs), it is essential to assess their ability to effectively comprehend these punchlines. However, existing benchmarks on punchline comprehension suffer from three major limitations: 1) language shortcuts that allow models to solely rely on text, 2) lack of question diversity, and 3) narrow focus on a specific domain of multimodal content (e.g., cartoon). To address these limitations, we introduce a multimodal \textbf{Punch}line comprehension \textbf{Bench}mark, named \textbf{PunchBench}, which is tailored for accurate and comprehensive evaluation of punchline comprehension. To enhance the evaluation accuracy, we generate synonymous and antonymous captions by modifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension· underline

Taxonomy

TopicsSubtitles and Audiovisual Media · Speech and dialogue systems

MethodsFocus