UniMEL: A Unified Framework for Multimodal Entity Linking with Large   Language Models

Liu Qi; He Yongyi; Lian Defu; Zheng Zhi; Xu Tong; Liu Che; and Chen; Enhong

arXiv:2407.16160·cs.AI·August 22, 2024

UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models

Liu Qi, He Yongyi, Lian Defu, Zheng Zhi, Xu Tong, Liu Che, and Chen, Enhong

PDF

1 Repo

TL;DR

UniMEL introduces a unified LLM-based framework for multimodal entity linking that effectively integrates textual and visual information, achieving state-of-the-art results while requiring minimal fine-tuning.

Contribution

The paper presents UniMEL, a novel framework leveraging Large Language Models for multimodal entity linking, simplifying the process and enhancing performance across benchmarks.

Findings

01

Achieves state-of-the-art performance on three datasets.

02

Effectively integrates multimodal information with minimal fine-tuning.

03

Verifies the importance of each module through ablation studies.

Abstract

Multimodal Entity Linking (MEL) is a crucial task that aims at linking ambiguous mentions within multimodal contexts to the referent entities in a multimodal knowledge base, such as Wikipedia. Existing methods focus heavily on using complex mechanisms and extensive model tuning methods to model the multimodal interaction on specific datasets. However, these methods overcomplicate the MEL task and overlook the visual semantic information, which makes them costly and hard to scale. Moreover, these methods can not solve the issues like textual ambiguity, redundancy, and noisy images, which severely degrade their performance. Fortunately, the advent of Large Language Models (LLMs) with robust capabilities in text understanding and reasoning, particularly Multimodal Large Language Models (MLLMs) that can process multimodal inputs, provides new insights into addressing this challenge.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

javkonline/unimel
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus