Large Language Models and Multimodal Retrieval for Visual Word Sense   Disambiguation

Anastasia Kritharoula; Maria Lymperaiou; Giorgos Stamou

arXiv:2310.14025·cs.CL·April 23, 2024·1 cites

Large Language Models and Multimodal Retrieval for Visual Word Sense Disambiguation

Anastasia Kritharoula, Maria Lymperaiou, Giorgos Stamou

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of multimodal transformers and Large Language Models to improve visual word sense disambiguation by combining various retrieval and reasoning approaches, leading to competitive results.

Contribution

It introduces a comprehensive approach combining multimodal retrieval, LLM-based knowledge enhancement, and learn-to-rank models for VWSD, advancing the state of the art.

Findings

01

Multimodal transformer methods improve retrieval accuracy.

02

LLMs with Chain-of-Thought enhance explainability.

03

Combining modules via learn-to-rank yields competitive performance.

Abstract

Visual Word Sense Disambiguation (VWSD) is a novel challenging task with the goal of retrieving an image among a set of candidates, which better represents the meaning of an ambiguous word within a given context. In this paper, we make a substantial step towards unveiling this interesting task by applying a varying set of approaches. Since VWSD is primarily a text-image retrieval task, we explore the latest transformer-based methods for multimodal retrieval. Additionally, we utilize Large Language Models (LLMs) as knowledge bases to enhance the given phrases and resolve ambiguity related to the target word. We also study VWSD as a unimodal problem by converting to text-to-text and image-to-image retrieval, as well as question-answering (QA), to fully explore the capabilities of relevant models. To tap into the implicit knowledge of LLMs, we experiment with Chain-of-Thought (CoT)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anastasiakrith/multimodal-retrieval-for-vwsd
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsSparse Evolutionary Training