Multi-Grained Query-Guided Set Prediction Network for Grounded   Multimodal Named Entity Recognition

Jielong Tang; Zhenxing Wang; Ziyang Gong; Jianxing Yu; Xiangwei Zhu; and Jian Yin

arXiv:2407.21033·cs.IR·January 28, 2025

Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition

Jielong Tang, Zhenxing Wang, Ziyang Gong, Jianxing Yu, Xiangwei Zhu, and Jian Yin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MQSPN, a novel set prediction network for grounded multimodal named entity recognition, effectively modeling intra- and inter-entity relationships to improve extraction accuracy.

Contribution

The paper proposes a unified framework with learnable queries and a fusion network to better model relationships in GMNER, surpassing previous methods.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively models intra- and inter-entity relationships.

03

Outperforms existing unified approaches.

Abstract

Grounded Multimodal Named Entity Recognition (GMNER) is an emerging information extraction (IE) task, aiming to simultaneously extract entity spans, types, and corresponding visual regions of entities from given sentence-image pairs data. Recent unified methods employing machine reading comprehension or sequence generation-based frameworks show limitations in this difficult task. The former, utilizing human-designed type queries, struggles to differentiate ambiguous entities, such as Jordan (Person) and off-White x Jordan (Shoes). The latter, following the one-by-one decoding order, suffers from exposure bias issues. We maintain that these works misunderstand the relationships of multimodal entities. To tackle these, we propose a novel unified framework named Multi-grained Query-guided Set Prediction Network (MQSPN) to learn appropriate relationships at intra-entity and inter-entity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tangjielong928/mqspn
pytorchOfficial

Videos

Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsSparse Evolutionary Training