Multimodal Relational Triple Extraction with Query-based Entity Object Transformer
Lei Hei, Ning An, Tingjing Liao, Qi Ma, Jiaqi Wang, Feiliang Ren

TL;DR
This paper introduces a novel multimodal relation extraction task and a query-based model that jointly extracts entity-object triples from image-text pairs, improving accuracy and efficiency over previous methods.
Contribution
The paper proposes a new task, a modified dataset, and a query-based model with attention for joint extraction of entities, relations, and objects from multimodal data.
Findings
Outperforms existing baselines by 8.06%
Creates a new dataset with 20,264 triples
Achieves state-of-the-art performance
Abstract
Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge graphs. Recent studies focus on extracting the relation type with entity pairs present in different modalities, such as one entity in the text and another in the image. However, existing approaches require entities and objects given beforehand, which is costly and impractical. To address the limitation, we propose a novel task, Multimodal Entity-Object Relational Triple Extraction, which aims to extract all triples (entity span, relation, object region) from image-text pairs. To facilitate this study, we modified a multimodal relation extraction dataset MORE, which includes 21 relation types, to create a new dataset containing 20,264 triples, averaging 5.75 triples per image-text pair. Moreover, we propose QEOT, a query-based model with a selective attention mechanism, to dynamically explore the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Focus
