LLM Agents Improve Semantic Code Search
Sarthak Jain (University of Illinois Urbana Champaign, Cisco),, Aditya Dora (University of Illinois Urbana Champaign), Ka Seng Sam, (University of Illinois Urbana Champaign), and Prabhat Singh (Cisco)

TL;DR
This paper presents a novel approach using Retrieval Augmented Generation (RAG) and ensemble methods with LLM agents to significantly improve semantic code search accuracy, especially in ambiguous or context-dependent queries.
Contribution
It introduces a RAG-powered agentic framework and multi-stream ensemble technique that enhance code search performance, outperforming existing methods on the CodeSearchNet dataset.
Findings
78.2% success rate at Success@10
34.6% success rate at Success@1
Significant improvement over prior methods
Abstract
Code Search is a key task that many programmers often have to perform while developing solutions to problems. Current methodologies suffer from an inability to perform accurately on prompts that contain some ambiguity or ones that require additional context relative to a code-base. We introduce the approach of using Retrieval Augmented Generation (RAG) powered agents to inject information into user prompts allowing for better inputs into embedding models. By utilizing RAG, agents enhance user queries with relevant details from GitHub repositories, making them more informative and contextually aligned. Additionally, we introduce a multi-stream ensemble approach which when paired with agentic workflow can obtain improved retrieval accuracy, which we deploy on application called repo-rift.com. Experimental results on the CodeSearchNet dataset demonstrate that RepoRift significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Residual Connection · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Dropout · Adam · Layer Normalization
