NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding   Benchmark on Cartoon-Caption Dataset

Ke Chang; Hao Li; Junzhao Zhang; Yunfang Wu

arXiv:2409.01037·cs.CL·September 4, 2024

NYK-MS: A Well-annotated Multi-modal Metaphor and Sarcasm Understanding Benchmark on Cartoon-Caption Dataset

Ke Chang, Hao Li, Junzhao Zhang, Yunfang Wu

PDF

Open Access

TL;DR

This paper introduces NYK-MS, a comprehensive multi-modal benchmark dataset for understanding metaphor and sarcasm in cartoons and captions, highlighting the challenges faced by current models and the importance of multimodal understanding.

Contribution

The creation of a well-annotated, multi-modal benchmark dataset for metaphor and sarcasm understanding, with extensive experiments revealing limitations of current models and the need for multimodal comprehension.

Findings

01

LLMs and LMMs perform poorly on classification tasks in zero-shot settings.

02

Model performance improves with scale on five tasks.

03

Augmentation and alignment enhance traditional model performance.

Abstract

Metaphor and sarcasm are common figurative expressions in people's communication, especially on the Internet or the memes popular among teenagers. We create a new benchmark named NYK-MS (NewYorKer for Metaphor and Sarcasm), which contains 1,583 samples for metaphor understanding tasks and 1,578 samples for sarcasm understanding tasks. These tasks include whether it contains metaphor/sarcasm, which word or object contains metaphor/sarcasm, what does it satirize and why does it contains metaphor/sarcasm, all of the 7 tasks are well-annotated by at least 3 annotators. We annotate the dataset for several rounds to improve the consistency and quality, and use GUI and GPT-4V to raise our efficiency. Based on the benchmark, we conduct plenty of experiments. In the zero-shot experiments, we show that Large Language Models (LLM) and Large Multi-modal Models (LMM) can't do classification task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Natural Language Processing Techniques