Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
Yikai Zhang, Qianyu He, Xintao Wang, Siyu Yuan, Jiaqing Liang, Yanghua, Xiao

TL;DR
This paper introduces COG, a two-stage concept-guided framework that improves long-tailed entity grounding in vision-language models by enhancing accuracy and providing explainability, addressing noise and data scarcity issues in large-scale Multi-Modal Knowledge Graphs.
Contribution
The paper proposes a novel two-stage framework, COG, that leverages concept guidance to enhance long-tailed entity grounding and introduces a new dataset for evaluation.
Findings
COG improves accuracy in long-tailed entity recognition.
The framework offers explainability and human verification capabilities.
Experimental results outperform baseline methods.
Abstract
Multi-Modal Knowledge Graphs (MMKGs) have proven valuable for various downstream tasks. However, scaling them up is challenging because building large-scale MMKGs often introduces mismatched images (i.e., noise). Most entities in KGs belong to the long tail, meaning there are few images of them available online. This scarcity makes it difficult to determine whether a found image matches the entity. To address this, we draw on the Triangle of Reference Theory and suggest enhancing vision-language models with concept guidance. Specifically, we introduce COG, a two-stage framework with COncept-Guided vision-language models. The framework comprises a Concept Integration module, which effectively identifies image-text pairs of long-tailed entities, and an Evidence Fusion module, which offers explainability and enables human verification. To demonstrate the effectiveness of COG, we create a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Data Quality and Management · Multimodal Machine Learning Applications
