mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA

Xu Yuan; Liangbo Ning; Qingqing Ye; Wenqi Fan; and Qing Li

arXiv:2508.05318·cs.CV·April 28, 2026

mKG-RAG: Leveraging Multimodal Knowledge Graphs in Retrieval-Augmented Generation for Knowledge-intensive VQA

Xu Yuan, Liangbo Ning, Qingqing Ye, Wenqi Fan, and Qing Li

PDF

1 Repo

TL;DR

mKG-RAG introduces a novel framework that integrates multimodal knowledge graphs into retrieval-augmented generation for improved knowledge-intensive visual question answering, enhancing accuracy and reliability.

Contribution

It proposes a new method combining multimodal KGs with RAG, utilizing graph extraction and a dual-stage retrieval to improve VQA performance.

Findings

01

Outperforms existing methods on knowledge-based VQA tasks.

02

Achieves new state-of-the-art results.

03

Effectively leverages structured multimodal knowledge.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as an effective paradigm for expanding the knowledge capacity of Multimodal Large Language Models (MLLMs) by incorporating external knowledge sources into the generation process, and has been widely adopted for knowledge-based Visual Question Answering (VQA). Despite impressive advancements, vanilla RAG-based VQA methods that rely on unstructured documents and overlook the structural relations among knowledge elements frequently introduce irrelevant or misleading content, degrading answer accuracy and reliability. To overcome these challenges, a promising solution is to integrate multimodal knowledge graphs (KGs) into RAG-based VQA frameworks, thereby enhancing generation through structured multimodal knowledge. To this end, this paper proposes mKG-RAG, a novel retrieval-augmented generation framework built upon multimodal KGs for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xandery-geek/mKG-RAG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.