UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models   for Multilingual Multimodal Idiomaticity Representation

Thanet Markchom; Tong Wu; Liting Huang; Huizhi Liang

arXiv:2502.20984·cs.CL·May 2, 2025

UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation

Thanet Markchom, Tong Wu, Liting Huang, Huizhi Liang

PDF

1 Video

TL;DR

This paper presents a novel approach combining generative LLMs and multilingual CLIP models to improve the ranking of images based on idiomatic nominal compounds in English and Portuguese, demonstrating enhanced multimodal representations.

Contribution

The work introduces a new multimodal method leveraging LLM-generated idiomatic meanings and CLIP embeddings, with contrastive learning and data augmentation for better image ranking.

Findings

01

Multimodal representations outperform original nominal compounds.

02

Fine-tuning yields less improvement than using embeddings directly.

03

The approach effectively captures idiomatic meanings across languages.

Abstract

SemEval-2025 Task 1 focuses on ranking images based on their alignment with a given nominal compound that may carry idiomatic meaning in both English and Brazilian Portuguese. To address this challenge, this work uses generative large language models (LLMs) and multilingual CLIP models to enhance idiomatic compound representations. LLMs generate idiomatic meanings for potentially idiomatic compounds, enriching their semantic interpretation. These meanings are then encoded using multilingual CLIP models, serving as representations for image ranking. Contrastive learning and data augmentation techniques are applied to fine-tune these embeddings for improved performance. Experimental results show that multimodal representations extracted through this method outperformed those based solely on the original nominal compounds. The fine-tuning approach shows promising outcomes but is less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UoR-NCL at SemEval-2025 Task 1: Using Generative LLMs and CLIP Models for Multilingual Multimodal Idiomaticity Representation· underline

Taxonomy

MethodsContrastive Learning · Contrastive Language-Image Pre-training