RPT: Effective and Efficient Retrieval of Program Translations from Big Code
Binger Chen, Ziawasch Abedjan

TL;DR
This paper introduces RPT, a novel system for efficient cross-language program translation retrieval from large codebases, utilizing a lightweight representation and hierarchical filtering to improve performance and generalizability.
Contribution
RPT provides a new lightweight program representation and an efficient retrieval framework for cross-language code translation from Big Code databases.
Findings
Effective retrieval of program translations from large codebases.
Generalizable representation applicable to all imperative programming languages.
Hierarchical filtering enhances retrieval efficiency.
Abstract
Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs, i.e., Big Code. In particular, one can search for program translations in Big Code. However, existing code retrieval techniques are not designed for cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. In this paper, we present RPT, a novel code translation retrieval system. We propose a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we present our index structure and hierarchical filtering mechanism for efficient code retrieval from a Big Code database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling
