A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding
Dilara Toruno\u{g}lu-Selamet, Dogukan Arslan, Rodrigo Wilkens, Wei He, Doruk Eryi\u{g}it, Thomas Pickard, Adriana S. Pagano, Aline Villavicencio, G\"ul\c{s}en Eryi\u{g}it, \'Agnes Abuczki, Aida Cardoso, Alesia Lazarenka, Dina Almassova, Amalia Mendes, Anna Kanellopoulou

TL;DR
This paper introduces XMPIE, a comprehensive multilingual and multimodal dataset of potentially idiomatic expressions across 34 languages, enabling evaluation of cross-lingual and cross-modal idiomatic understanding in NLP systems.
Contribution
The creation of XMPIE, a large-scale, high-quality benchmark dataset that supports cross-lingual and multimodal analysis of idiomatic expressions, filling a gap in NLP resources.
Findings
Enables comparison of idiomatic patterns across languages.
Facilitates evaluation of cross-modal idiomatic understanding.
Supports analysis of cultural and linguistic shared aspects.
Abstract
Potentially idiomatic expressions (PIEs) construe meanings inherently tied to the everyday experience of a given language community. As such, they constitute an interesting challenge for assessing the linguistic (and to some extent cultural) capabilities of NLP systems. In this paper, we present XMPIE, a parallel multilingual and multimodal dataset of potentially idiomatic expressions. The dataset, containing 34 languages and over ten thousand items, allows comparative analyses of idiomatic patterns among language-specific realisations and preferences in order to gather insights about shared cultural aspects. This parallel dataset allows to evaluate model performance for a given PIE in different languages and whether idiomatic understanding in one language can be transferred to another. Moreover, the dataset supports the study of PIEs across textual and visual modalities, to measure to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Metaphor, and Cognition · Topic Modeling
