QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering

Zhuohang Jiang; Pangjing Wu; Xu Yuan; Wenqi Fan; Qing Li

arXiv:2508.05197·cs.AI·March 17, 2026

QA-Dragon: Query-Aware Dynamic RAG System for Knowledge-Intensive Visual Question Answering

Zhuohang Jiang, Pangjing Wu, Xu Yuan, Wenqi Fan, Qing Li

PDF

TL;DR

QA-Dragon is a novel query-aware dynamic retrieval system for knowledge-intensive visual question answering that effectively combines text and image retrieval for complex, multi-hop reasoning tasks, significantly improving accuracy.

Contribution

It introduces a domain router and a search router to dynamically select retrieval strategies, enabling multimodal, multi-turn, and multi-hop reasoning in VQA.

Findings

01

Outperforms baselines by 5.06% on single-source tasks

02

Achieves 6.35% improvement on multi-source tasks

03

Enhances reasoning performance in complex scenarios

Abstract

Retrieval-Augmented Generation (RAG) has been introduced to mitigate hallucinations in Multimodal Large Language Models (MLLMs) by incorporating external knowledge into the generation process, and it has become a widely adopted approach for knowledge-intensive Visual Question Answering (VQA). However, existing RAG methods typically retrieve from either text or images in isolation, limiting their ability to address complex queries that require multi-hop reasoning or up-to-date factual knowledge. To address this limitation, we propose QA-Dragon, a Query-Aware Dynamic RAG System for Knowledge-Intensive VQA. Specifically, QA-Dragon introduces a domain router to identify the query's subject domain for domain-specific reasoning, along with a search router that dynamically selects optimal retrieval strategies. By orchestrating both text and image search agents in a hybrid setup, our system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.