Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity   and Relation Extraction

Xuming Hu; Junzhe Chen; Aiwei Liu; Shiao Meng; Lijie Wen; Philip S. Yu

arXiv:2310.16822·cs.CL·October 26, 2023·2 cites

Prompt Me Up: Unleashing the Power of Alignments for Multimodal Entity and Relation Extraction

Xuming Hu, Junzhe Chen, Aiwei Liu, Shiao Meng, Lijie Wen, Philip S. Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised pre-training approach that leverages image-caption alignments to improve multimodal entity and relation extraction, achieving significant F1 score improvements.

Contribution

It proposes new pre-training objectives for aligning entities, objects, and relations across images and text, effectively utilizing unlabeled image-caption pairs for enhanced extraction.

Findings

01

Achieved an average 3.41% F1 improvement over prior SOTA.

02

Method is orthogonal and further improves existing multimodal fusion techniques.

03

Demonstrated effectiveness across three different datasets.

Abstract

How can we better extract entities and relations from text? Using multimodal extraction with images and text obtains more signals for entities and relations, and aligns them through graphs or hierarchical fusion, aiding in extraction. Despite attempts at various fusions, previous works have overlooked many unlabeled image-caption pairs, such as NewsCLIPing. This paper proposes innovative pre-training objectives for entity-object and relation-image alignment, extracting objects from images and aligning them with entity and relation prompts for soft pseudo-labels. These labels are used as self-supervised signals for pre-training, enhancing the ability to extract entities and relations. Experiments on three datasets show an average 3.41% F1 improvement over prior SOTA. Additionally, our method is orthogonal to previous multimodal fusions, and using it on prior SOTA fusions further improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-bpm/promu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques