MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Chi-Hsiang Hsiao; Yi-Cheng Wang; Tzung-Sheng Lin; Yi-Ren Yeh; Chu-Song Chen

arXiv:2512.20626·cs.AI·April 21, 2026

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation

Chi-Hsiang Hsiao, Yi-Cheng Wang, Tzung-Sheng Lin, Yi-Ren Yeh, Chu-Song Chen

PDF

1 Repo

TL;DR

MegaRAG introduces a multimodal knowledge graph-based retrieval augmented generation method that integrates visual cues for improved reasoning over complex, domain-specific content, outperforming existing approaches in question answering tasks.

Contribution

It presents a novel multimodal knowledge graph framework that incorporates visual information into retrieval and generation, enhancing reasoning over multimodal content.

Findings

01

Outperforms existing RAG methods on textual question answering tasks.

02

Demonstrates improved reasoning with visual cues in knowledge graphs.

03

Effective across both global and fine-grained question answering.

Abstract

Retrieval-augmented generation (RAG) enables large language models (LLMs) to dynamically access external information, which is powerful for answering questions over previously unseen documents. Nonetheless, they struggle with high-level conceptual understanding and holistic comprehension due to limited context windows, which constrain their ability to perform deep reasoning over long-form, domain-specific content such as full-length books. To solve this problem, knowledge graphs (KGs) have been leveraged to provide entity-centric structure and hierarchical summaries, offering more structured support for reasoning. However, existing KG-based RAG solutions remain restricted to text-only inputs and fail to leverage the complementary insights provided by other modalities such as vision. On the other hand, reasoning from visual documents requires textual, visual, and spatial cues into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-application-and-integration-lab/MegaRAG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.