M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation

Hyeongcheol Park; Jiyoung Seo; Jaewon Mun; Hogun Park; Wonmin Byeon; Sung June Kim; Hyeonsoo Im; JeungSub Lee; Sangpil Kim

arXiv:2512.20136·cs.CL·April 14, 2026

M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation

Hyeongcheol Park, Jiyoung Seo, Jaewon Mun, Hogun Park, Wonmin Byeon, Sung June Kim, Hyeonsoo Im, JeungSub Lee, Sangpil Kim

PDF

1 Repo

TL;DR

M$^3$KG-RAG enhances multimodal reasoning in large language models by retrieving and grounding multi-hop multimodal knowledge from knowledge graphs, improving answer faithfulness and relevance.

Contribution

It introduces a multi-hop multimodal knowledge graph and a grounding and pruning method to improve multimodal retrieval-augmented generation.

Findings

01

Significant improvement in multimodal reasoning accuracy.

02

Enhanced grounding and relevance in generated responses.

03

Effective multi-hop knowledge retrieval from MMKGs.

Abstract

Retrieval-Augmented Generation (RAG) has recently been extended to multimodal settings, connecting multimodal large language models (MLLMs) with vast corpora of external knowledge such as multimodal knowledge graphs (MMKGs). Despite their recent success, multimodal RAG in the audio-visual domain remains challenging due to 1) limited modality coverage and multi-hop connectivity of existing MMKGs, and 2) retrieval based solely on similarity in a shared multimodal embedding space, which fails to filter out off-topic or redundant knowledge. To address these limitations, we propose M $^{3}$ KG-RAG, a Multi-hop Multimodal Knowledge Graph-enhanced RAG that retrieves query-aligned audio-visual knowledge from MMKGs, improving reasoning depth and answer faithfulness in MLLMs. Specifically, we devise a lightweight multi-agent pipeline to construct multi-hop MMKG (M $^{3}$ KG), which contains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://kuai-lab.github.io/cvpr2026m3kgrag
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.