IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling
Yilin Wen, Biao Luo, Yuqian Zhao

TL;DR
This paper introduces IMKGA-SM, a novel interpretable model for multimodal knowledge graph link prediction that effectively combines multimodal data, models the task as a reinforcement learning problem, and outperforms state-of-the-art methods.
Contribution
The paper proposes a new multimodal link prediction model that integrates fine-grained fusion, reinforcement learning, and interpretability mechanisms, addressing challenges of complex data and sparse training.
Findings
IMKGA-SM outperforms SOTA baselines on multiple datasets.
Effective multimodal feature extraction using Vgg16 and OCR.
Novel sequence modeling approach enhances interpretability and accuracy.
Abstract
Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data. However, for complex multimodal information and sparse training data, it is usually difficult to achieve interpretability and high accuracy simultaneously for most methods. To address this difficulty, a new model is developed in this paper, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are adopted to effectively extract text information from images and images. Then, the knowledge graph link prediction task is modelled as an offline reinforcement learning Markov decision model, which is then abstracted into a unified sequence framework. An interactive perception-based reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Text and Document Classification Technologies
