Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in   Videos

Khalid Alnajjar; Mika H\"am\"al\"ainen; Shuo Zhang

arXiv:2301.01134·cs.MM·January 4, 2023·1 cites

Ring That Bell: A Corpus and Method for Multimodal Metaphor Detection in Videos

Khalid Alnajjar, Mika H\"am\"al\"ainen, Shuo Zhang

PDF

Open Access

TL;DR

This paper introduces the first multimodal metaphor corpus from videos with annotations and proposes a text-based detection method achieving 62% F1-score, exploring multimodal approaches and analyzing their limitations.

Contribution

It provides the first openly available multimodal metaphor dataset and a text-based detection method, highlighting challenges in visual cue utilization.

Findings

01

Text-based model achieves 62% F1-score in metaphor detection.

02

Multimodal methods did not outperform text-only approach.

03

Visual cues are subtle and challenging for current models.

Abstract

We present the first openly available multimodal metaphor annotated corpus. The corpus consists of videos including audio and subtitles that have been annotated by experts. Furthermore, we present a method for detecting metaphors in the new dataset based on the textual content of the videos. The method achieves a high F1-score (62\%) for metaphorical labels. We also experiment with other modalities and multimodal methods; however, these methods did not out-perform the text-based model. In our error analysis, we do identify that there are cases where video could help in disambiguating metaphors, however, the visual cues are too subtle for our model to capture. The data is available on Zenodo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Subtitles and Audiovisual Media