I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking

Ziyan Liu; Junwen Li; Kaiwen Li; Tong Ruan; Chao Wang; Xinyan He; Zongyu Wang; Xuezhi Cao; Jingping Liu

arXiv:2508.02243·cs.CV·August 5, 2025

I2CR: Intra- and Inter-modal Collaborative Reflections for Multimodal Entity Linking

Ziyan Liu, Junwen Li, Kaiwen Li, Tong Ruan, Chao Wang, Xinyan He, Zongyu Wang, Xuezhi Cao, Jingping Liu

PDF

TL;DR

This paper introduces I2CR, a novel multimodal entity linking framework that emphasizes iterative reasoning over visual clues to improve accuracy, outperforming existing methods on multiple datasets.

Contribution

The paper presents a new LLM-based framework that uses intra- and inter-modal reflections with multi-round reasoning, addressing limitations of previous one-time visual feature extraction methods.

Findings

01

Outperforms state-of-the-art methods on three datasets

02

Achieves 3.2%, 5.1%, and 1.6% improvements respectively

03

Demonstrates effectiveness of iterative visual reasoning in multimodal linking

Abstract

Multimodal entity linking plays a crucial role in a wide range of applications. Recent advances in large language model-based methods have become the dominant paradigm for this task, effectively leveraging both textual and visual modalities to enhance performance. Despite their success, these methods still face two challenges, including unnecessary incorporation of image data in certain scenarios and the reliance only on a one-time extraction of visual features, which can undermine their effectiveness and accuracy. To address these challenges, we propose a novel LLM-based framework for the multimodal entity linking task, called Intra- and Inter-modal Collaborative Reflections. This framework prioritizes leveraging text information to address the task. When text alone is insufficient to link the correct entity through intra- and inter-modality evaluations, it employs a multi-round…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.