AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models
Mehdi Bahrami, N.C. Shrikanth, Yuji Mizobuchi, Lei Liu, Masahiro, Fukuyori, Wei-Peng Chen, Kazuki Munakata

TL;DR
This paper introduces AugmentedCode, a novel code retrieval approach that enhances performance by leveraging existing code information to create augmented programming language, significantly improving retrieval accuracy on benchmark datasets.
Contribution
The paper presents a new augmented code retrieval framework that outperforms existing models like CodeSearchNet and CodeBERT by utilizing augmented programming language techniques.
Findings
Achieved higher MRR scores of 0.73 and 0.96 on CodeSearchNet and CodeBERT.
Demonstrated the effectiveness of augmented programming language in code retrieval.
Published the improved model on HuggingFace with a demonstration video.
Abstract
Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
MethodsCodeBERT
