Enhancing Multimodal Entity and Relation Extraction with Variational   Information Bottleneck

Shiyao Cui; Jiangxia Cao; Xin Cong; Jiawei Sheng; Quangang Li; Tingwen; Liu; Jinqiao Shi

arXiv:2304.02328·cs.MM·February 12, 2024·1 cites

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck

Shiyao Cui, Jiangxia Cao, Xin Cong, Jiawei Sheng, Quangang Li, Tingwen, Liu, Jinqiao Shi

PDF

Open Access

TL;DR

This paper introduces a novel multimodal information bottleneck approach to improve entity recognition and relation extraction by reducing noise and aligning representations across text and images, achieving state-of-the-art results.

Contribution

It is the first to apply variational information bottleneck estimation to multimodal entity and relation extraction, addressing modality-noise and modality-gap issues.

Findings

01

Achieves state-of-the-art performance on three benchmarks.

02

Effectively reduces modality-noise in multimodal tasks.

03

Improves semantic alignment between text and images.

Abstract

This paper studies the multimodal named entity recognition (MNER) and multimodal relation extraction (MRE), which are important for multimedia social platform analysis. The core of MNER and MRE lies in incorporating evident visual information to enhance textual semantics, where two issues inherently demand investigations. The first issue is modality-noise, where the task-irrelevant information in each modality may be noises misleading the task prediction. The second issue is modality-gap, where representations from different modalities are inconsistent, preventing from building the semantic alignment between the text and image. To address these issues, we propose a novel method for MNER and MRE by Multi-Modal representation learning with Information Bottleneck (MMIB). For the first issue, a refinement-regularizer probes the information-bottleneck principle to balance the predictive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques