Learning Implicit Entity-object Relations by Bidirectional Generative   Alignment for Multimodal NER

Feng Chen; Jiajia Liu; Kaixiang Ji; Wang Ren; Jian Wang; Jingdong Wang

arXiv:2308.02570·cs.LG·August 8, 2023

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang

PDF

Open Access

TL;DR

This paper introduces BGA-MNER, a bidirectional generative alignment approach for multimodal named entity recognition that effectively captures implicit entity-object relations by leveraging cross-modal generation and a refined content sampling strategy.

Contribution

It proposes a novel bidirectional generative alignment framework that improves multimodal NER by modeling implicit relations without relying on image input at inference.

Findings

01

Achieves state-of-the-art results on two benchmarks.

02

Effectively captures implicit entity-object relations.

03

Improves robustness with stage-refined content sampling.

Abstract

The challenge posed by multimodal named entity recognition (MNER) is mainly two-fold: (1) bridging the semantic gap between text and image and (2) matching the entity with its associated object in image. Existing methods fail to capture the implicit entity-object relations, due to the lack of corresponding annotation. In this paper, we propose a bidirectional generative alignment method named BGA-MNER to tackle these issues. Our BGA-MNER consists of \texttt{image2text} and \texttt{text2image} generation with respect to entity-salient content in two modalities. It jointly optimizes the bidirectional reconstruction objectives, leading to aligning the implicit entity-object relations under such direct and powerful constraints. Furthermore, image-text pairs usually contain unmatched components which are noisy for generation. A stage-refined context sampler is proposed to extract the matched…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

Methodsfail