Multimodal Reasoning with Multimodal Knowledge Graph
Junlin Lee, Yequan Wang, Jing Li, Min Zhang

TL;DR
This paper introduces MR-MKG, a novel method that uses multimodal knowledge graphs to improve large language models' ability to perform multimodal reasoning, reducing hallucinations and enhancing understanding across image and text modalities.
Contribution
The paper presents a new approach leveraging multimodal knowledge graphs and a relation graph attention network to significantly enhance multimodal reasoning in LLMs with minimal additional parameters.
Findings
MR-MKG outperforms previous models on multimodal question answering.
Achieves superior results with only 2.25% of the LLM's parameters.
Constructed a new MMKG-grounded dataset for training and evaluation.
Abstract
Multimodal reasoning with large language models (LLMs) often suffers from hallucinations and the presence of deficient or outdated knowledge within LLMs. Some approaches have sought to mitigate these issues by employing textual knowledge graphs, but their singular modality of knowledge limits comprehensive cross-modal understanding. In this paper, we propose the Multimodal Reasoning with Multimodal Knowledge Graph (MR-MKG) method, which leverages multimodal knowledge graphs (MMKGs) to learn rich and semantic knowledge across modalities, significantly enhancing the multimodal reasoning capabilities of LLMs. In particular, a relation graph attention network is utilized for encoding MMKGs and a cross-modal alignment module is designed for optimizing image-text alignment. A MMKG-grounded dataset is constructed to equip LLMs with initial expertise in multimodal reasoning through pretraining.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling
