Few-Shot Joint Multimodal Entity-Relation Extraction via   Knowledge-Enhanced Cross-modal Prompt Model

Li Yuan; Yi Cai; Junsheng Huang

arXiv:2410.14225·cs.CL·March 25, 2025

Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model

Li Yuan, Yi Cai, Junsheng Huang

PDF

Open Access

TL;DR

This paper introduces KECPM, a novel knowledge-enhanced cross-modal prompt model that improves few-shot joint multimodal entity-relation extraction by generating supplementary background knowledge with large language models.

Contribution

The paper proposes KECPM, a two-stage method that dynamically formulates prompts and merges auxiliary knowledge to enhance few-shot JMERE performance, addressing data scarcity issues.

Findings

01

KECPM outperforms strong baselines in F1 scores.

02

The approach effectively incorporates background knowledge.

03

Qualitative analyses confirm model's interpretability and robustness.

Abstract

Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts. Existing methods for JMERE require large amounts of labeled data. However, gathering and annotating fine-grained multimodal data for JMERE poses significant challenges. Initially, we construct diverse and comprehensive multimodal few-shot datasets fitted to the original data distribution. To address the insufficient information in the few-shot setting, we introduce the \textbf{K}nowledge-\textbf{E}nhanced \textbf{C}ross-modal \textbf{P}rompt \textbf{M}odel (KECPM) for JMERE. This method can effectively address the problem of insufficient information in the few-shot setting by guiding a large language model to generate supplementary background knowledge. Our proposed method comprises two stages: (1) a knowledge ingestion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsALIGN