MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

Woosang Lim; Zekun Li; Gyuwan Kim; Sungyoung Ji; HyeonJung Kim; Kyuri Choi; Jin Hyuk Lim; Kyungpyo Park; William Yang Wang

arXiv:2505.06569·cs.CL·May 22, 2025

MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

Woosang Lim, Zekun Li, Gyuwan Kim, Sungyoung Ji, HyeonJung Kim, Kyuri Choi, Jin Hyuk Lim, Kyungpyo Park, William Yang Wang

PDF

Open Access 1 Repo

TL;DR

MacRAG introduces a hierarchical, multi-scale retrieval framework that enhances long-context reasoning in large language models by adaptively merging relevant document segments, improving retrieval precision and context coverage.

Contribution

We propose MacRAG, a novel hierarchical RAG system that compresses and partitions documents into multiple granularities, enabling adaptive context expansion for better long-context reasoning.

Findings

01

Outperforms baseline RAG in multi-hop question answering tasks.

02

Effective long-context reasoning with Llama-3.1-8B, Gemini-1.5-pro, and GPT-4o.

03

Scalable and efficient for real-world applications.

Abstract

Long-context large language models (LC LLMs) combined with retrieval-augmented generation (RAG) hold strong potential for complex multi-hop and large-document tasks. However, existing RAG systems often suffer from imprecise retrieval, incomplete context coverage under constrained windows, and fragmented information from suboptimal context construction. We introduce Multi-scale Adaptive Context RAG (MacRAG), a hierarchical RAG framework that compresses and partitions documents into coarse-to-fine granularities, then adaptively merges relevant contexts through real-time chunk- and document-level expansions. By initiating with finest-level retrieval and progressively incorporating broader, higher-level context, MacRAG constructs effective query-specific long contexts, optimizing both precision and coverage. Evaluations on challenging LongBench expansions of HotpotQA, 2WikiMultihopQA, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leezekun/macrag
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece