Meta-RAG on Large Codebases Using Code Summarization
Vali Tawosi, Salwa Alamir, Xiaomo Liu, Manuela Veloso

TL;DR
This paper introduces Meta-RAG, a novel retrieval-augmented generation system that condenses large codebases into summaries for effective bug localization using LLMs, achieving state-of-the-art accuracy.
Contribution
The paper presents Meta-RAG, a new multi-agent system that uses summaries to improve bug localization in large codebases with LLMs, reducing code size by nearly 80%.
Findings
Meta-RAG achieves 84.67% file-level bug localization accuracy.
Meta-RAG condenses codebases by an average of 79.8%.
State-of-the-art performance on SWE-bench Lite dataset.
Abstract
Large Language Model (LLM) systems have been at the forefront of applied Artificial Intelligence (AI) research in a multitude of domains. One such domain is software development, where researchers have pushed the automation of a number of code tasks through LLM agents. Software development is a complex ecosystem, that stretches far beyond code implementation and well into the realm of code maintenance. In this paper, we propose a multi-agent system to localize bugs in large pre-existing codebases using information retrieval and LLMs. Our system introduces a novel Retrieval Augmented Generation (RAG) approach, Meta-RAG, where we utilize summaries to condense codebases by an average of 79.8\%, into a compact, structured, natural language representation. We then use an LLM agent to determine which parts of the codebase are critical for bug resolution, i.e. bug localization. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques
