MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning
Thang Nguyen, Peter Chin, Yu-Wing Tai

TL;DR
MA-RAG introduces a multi-agent framework for retrieval-augmented generation that decomposes complex reasoning tasks into collaborative steps, significantly improving performance and interpretability across diverse QA benchmarks.
Contribution
It presents a novel multi-agent architecture with specialized agents for each reasoning stage, enhancing modularity, interpretability, and performance over existing RAG methods.
Findings
Outperforms standalone LLMs and existing RAG methods across multiple benchmarks.
Small LLaMA3-8B with MA-RAG surpasses larger standalone models.
Achieves state-of-the-art results on multi-hop datasets with larger models.
Abstract
We present MA-RAG, a Multi-Agent framework for Retrieval-Augmented Generation (RAG) that addresses the inherent ambiguities and reasoning challenges in complex information-seeking tasks. Unlike conventional RAG methods that rely on end-to-end fine-tuning or isolated component enhancements, MA-RAG orchestrates a collaborative set of specialized AI agents: Planner, Step Definer, Extractor, and QA Agents, each responsible for a distinct stage of the RAG pipeline. By decomposing tasks into subtasks such as query disambiguation, evidence extraction, and answer synthesis, and enabling agents to communicate intermediate reasoning via chain-of-thought prompting, MA-RAG progressively refines retrieval and synthesis while maintaining modular interpretability. Extensive experiments on multi-hop and ambiguous QA benchmarks, including NQ, HotpotQA, 2WikimQA, and TriviaQA, demonstrate that MA-RAG…
Peer Reviews
Decision·Submitted to ICLR 2026
- The idea of wrapping RAG with a multi-agent system is interesting. - The paper presentation is clear, especially Section 3.1, along with Table 1 and Table 2, which give the methodological design and empirical evaluation. - The comparison of Figure 3 is informative.
- The overall technical novelty and theoretical contribution are not adequate. - Figures 1 & 2 could be annotated more clearly to show information flow and agent triggers - The organization should be further polished; the current related work section is long.
Introduce a structured, modular decomposition of RAG with collaborative reasoning and on-demand agent invocation, which is distinct from prior iterative or monolithic designs. Experiments across multiple benchmarks and domains with robust ablations to validate each design component.
Multi-agent coordination introduces additional latency and token usage. Although discussed in Section 4.3, quantitative runtime–cost analysis (beyond response time) is limited. Current evaluation focuses on QA; broader testing (e.g., long-form summarization, reasoning-heavy retrieval) would demonstrate wider applicability. The relation to other multi-agent LLM coordination systems (e.g., MetaGPT, AgentVerse) could be expanded to highlight MA-RAG’s distinct innovations.
- The paper is well-structured and clearly written, with detailed illustrations explaining the multi-agent workflow. - The idea of decomposing the RAG pipeline into distinct reasoning agents is conceptually sound. - The paper conducts extensive benchmarking across multiple datasets and scales.
- The motivation of MA-RAG is not clearly articulated beyond being a conceptual adaptation of existing retrieval-augmented generation (RAG) pipelines into a multi-agent form. The introduction mainly highlights general RAG challenges (ambiguity, multi-hop reasoning) but does not provide a concrete insight into why a multi-agent decomposition fundamentally improves these issues beyond modular orchestration. MA-RAG largely repackages these ideas under a multi-agent setting without introducing new a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Management and Algorithms · Advanced Text Analysis Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Byte Pair Encoding
