Meta-RAG on Large Codebases Using Code Summarization

Vali Tawosi; Salwa Alamir; Xiaomo Liu; Manuela Veloso

arXiv:2508.02611·cs.SE·August 8, 2025

Meta-RAG on Large Codebases Using Code Summarization

Vali Tawosi, Salwa Alamir, Xiaomo Liu, Manuela Veloso

PDF

Open Access

TL;DR

This paper introduces Meta-RAG, a novel retrieval-augmented generation system that condenses large codebases into summaries for effective bug localization using LLMs, achieving state-of-the-art accuracy.

Contribution

The paper presents Meta-RAG, a new multi-agent system that uses summaries to improve bug localization in large codebases with LLMs, reducing code size by nearly 80%.

Findings

01

Meta-RAG achieves 84.67% file-level bug localization accuracy.

02

Meta-RAG condenses codebases by an average of 79.8%.

03

State-of-the-art performance on SWE-bench Lite dataset.

Abstract

Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains. One such domain is software development, where researchers have pushed the automation of a number of code tasks through LLM agents. Software development is a complex ecosystem, that stretches far beyond code implementation and well into the realm of code maintenance. In this paper, we propose a multi-agent system to localize bugs in large pre-existing codebases using information retrieval and LLMs. Our system introduces a novel Retrieval Augmented Generation (RAG) approach, Meta-RAG, where we utilize summaries to condense codebases by an average of 79.8\%, into a compact, structured, natural language representation. We then use an LLM agent to determine which parts of the codebase are critical for bug resolution, i.e. bug localization. We demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques