Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation
Manish Bhattarai, Javier E. Santos, Shawn Jones, Ayan Biswas, Boian, Alexandrov, Daniel O'Malley

TL;DR
This paper presents a retrieval-augmented few-shot learning method to improve code translation accuracy in large language models by dynamically leveraging relevant code examples from existing repositories.
Contribution
It introduces a novel Retrieval-Augmented Generation approach for code translation that enhances model performance without extensive retraining, outperforming traditional zero-shot methods.
Findings
Significant improvement in translation quality, especially for complex tasks.
Effective across diverse datasets and models, including open and commercial LLMs.
Robustness demonstrated with various shot numbers and embedding models.
Abstract
The advent of large language models (LLMs) has significantly advanced the field of code translation, enabling automated translation between programming languages. However, these models often struggle with complex translation tasks due to inadequate contextual understanding. This paper introduces a novel approach that enhances code translation through Few-Shot Learning, augmented with retrieval-based techniques. By leveraging a repository of existing code translations, we dynamically retrieve the most relevant examples to guide the model in translating new code segments. Our method, based on Retrieval-Augmented Generation (RAG), substantially improves translation quality by providing contextual examples from which the model can learn in real-time. We selected RAG over traditional fine-tuning methods due to its ability to utilize existing codebases or a locally stored corpus of code,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Dropout · WordPiece · Cosine Annealing · BART · Attention Dropout · Adam · Linear Layer
