REINFOREST: Reinforcing Semantic Code Similarity for Cross-Lingual Code Search Models
Anthony Saieva, Saikat Chakraborty, Gail Kaiser

TL;DR
REINFOREST is a novel cross-lingual code search method that improves LLM performance by incorporating static and dynamic features, using both similar and dissimilar examples during training, and outperforming state-of-the-art tools.
Contribution
It introduces the first code search technique encoding dynamic runtime info without execution at inference, and trains on both positive and negative samples for enhanced cross-language retrieval.
Findings
Outperforms state-of-the-art by up to 44.7%
Single positive and negative samples significantly improve performance
Fine-tuned models outperform larger LLMs without fine-tuning
Abstract
This paper introduces a novel code-to-code search technique that enhances the performance of Large Language Models (LLMs) by including both static and dynamic features as well as utilizing both similar and dissimilar examples during training. We present the first-ever code search method that encodes dynamic runtime information during training without the need to execute either the corpus under search or the search query at inference time and the first code search technique that trains on both positive and negative reference samples. To validate the efficacy of our approach, we perform a set of studies demonstrating the capability of enhanced LLMs to perform cross-language code-to-code search. Our evaluation demonstrates that the effectiveness of our approach is consistent across various model architectures and programming languages. We outperform the state-of-the-art cross-language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques
