Multimodal Reasoning with Multimodal Knowledge Graph

Junlin Lee; Yequan Wang; Jing Li; Min Zhang

arXiv:2406.02030·cs.CL·June 6, 2024

Multimodal Reasoning with Multimodal Knowledge Graph

Junlin Lee, Yequan Wang, Jing Li, Min Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces MR-MKG, a novel method that uses multimodal knowledge graphs to improve large language models' ability to perform multimodal reasoning, reducing hallucinations and enhancing understanding across image and text modalities.

Contribution

The paper presents a new approach leveraging multimodal knowledge graphs and a relation graph attention network to significantly enhance multimodal reasoning in LLMs with minimal additional parameters.

Findings

01

MR-MKG outperforms previous models on multimodal question answering.

02

Achieves superior results with only 2.25% of the LLM's parameters.

03

Constructed a new MMKG-grounded dataset for training and evaluation.

Abstract

Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs. Some approaches have sought to mitigate these issues by employing textual knowledge graphs, but their singular modality of knowledge limits comprehensive cross-modal understanding. In this paper, we propose the Multimodal Reasoning with Multimodal Knowledge Graph (MR-MKG) method, which leverages multimodal knowledge graphs (MMKGs) to learn rich and semantic knowledge across modalities, significantly enhancing the multimodal reasoning capabilities of LLMs. In particular, a relation graph attention network is utilized for encoding MMKGs and a cross-modal alignment module is designed for optimizing image-text alignment. A MMKG-grounded dataset is constructed to equip LLMs with initial expertise in multimodal reasoning through pretraining.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multimodal Reasoning with Multimodal Knowledge Graph· underline

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling