LLM Agents Improve Semantic Code Search

Sarthak Jain (University of Illinois Urbana Champaign; Cisco),; Aditya Dora (University of Illinois Urbana Champaign); Ka Seng Sam; (University of Illinois Urbana Champaign); and Prabhat Singh (Cisco)

arXiv:2408.11058·cs.SE·August 22, 2024

LLM Agents Improve Semantic Code Search

Sarthak Jain (University of Illinois Urbana Champaign, Cisco),, Aditya Dora (University of Illinois Urbana Champaign), Ka Seng Sam, (University of Illinois Urbana Champaign), and Prabhat Singh (Cisco)

PDF

Open Access

TL;DR

This paper presents a novel approach using Retrieval Augmented Generation (RAG) and ensemble methods with LLM agents to significantly improve semantic code search accuracy, especially in ambiguous or context-dependent queries.

Contribution

It introduces a RAG-powered agentic framework and multi-stream ensemble technique that enhance code search performance, outperforming existing methods on the CodeSearchNet dataset.

Findings

01

78.2% success rate at Success@10

02

34.6% success rate at Success@1

03

Significant improvement over prior methods

Abstract

Code Search is a key task that many programmers often have to perform while developing solutions to problems. Current methodologies suffer from an inability to perform accurately on prompts that contain some ambiguity or ones that require additional context relative to a code-base. We introduce the approach of using Retrieval Augmented Generation (RAG) powered agents to inject information into user prompts allowing for better inputs into embedding models. By utilizing RAG, agents enhance user queries with relevant details from GitHub repositories, making them more informative and contextually aligned. Additionally, we introduce a multi-stream ensemble approach which when paired with agentic workflow can obtain improved retrieval accuracy, which we deploy on application called repo-rift.com. Experimental results on the CodeSearchNet dataset demonstrate that RepoRift significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · WordPiece · Residual Connection · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Dropout · Adam · Layer Normalization