Ukrainian Visual Word Sense Disambiguation Benchmark

Yurii Laba; Yaryna Mohytych; Ivanna Rohulia; Halyna Kyryleyza; Hanna Dydyk-Meush; Oles Dobosevych; Rostyslav Hryniv

arXiv:2603.23627·cs.CV·March 26, 2026

Ukrainian Visual Word Sense Disambiguation Benchmark

Yurii Laba, Yaryna Mohytych, Ivanna Rohulia, Halyna Kyryleyza, Hanna Dydyk-Meush, Oles Dobosevych, Rostyslav Hryniv

PDF

Open Access

TL;DR

This paper introduces a Ukrainian Visual Word Sense Disambiguation benchmark to evaluate multilingual models' ability to identify the correct image representation of ambiguous words with minimal context, highlighting a performance gap with English.

Contribution

It presents the first Ukrainian Visual-WSD benchmark, adapting a methodology from other languages, and evaluates multilingual models, revealing significant performance differences.

Findings

01

All models underperform compared to the CLIP baseline.

02

Models perform worse on Ukrainian than on English.

03

The benchmark enables cross-language performance comparison.

Abstract

This study presents a benchmark for evaluating the Visual Word Sense Disambiguation (Visual-WSD) task in Ukrainian. The main goal of the Visual-WSD task is to identify, with minimal contextual information, the most appropriate representation of a given ambiguous word from a set of ten images. To construct this benchmark, we followed a methodology similar to that proposed by (CITATION), who previously introduced benchmarks for the Visual-WSD task in English, Italian, and Farsi. This approach allows us to incorporate the Ukrainian benchmark into a broader framework for cross-language model performance comparisons. We collected the benchmark data semi-automatically and refined it with input from domain experts. We then assessed eight multilingual and multimodal large language models using this benchmark. All tested models performed worse than the zero-shot CLIP-based baseline model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Data Visualization and Analytics