TL;DR
This paper introduces TranCS, a novel code search method that translates code snippets into natural language using context-aware execution simulation, significantly improving retrieval accuracy over existing techniques.
Contribution
The paper proposes a new context-aware code translation approach with a shared embedding space, enhancing semantic matching in code search tasks.
Findings
TranCS outperforms state-of-the-art methods by up to 66.50% in MRR.
Using execution simulation improves semantic understanding of code snippets.
Shared vocabulary for embeddings reduces divergence between query and code representations.
Abstract
Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning models to construct embedding representations for code snippets and queries, respectively. Features such as abstract syntactic trees, control flow graphs, etc., are commonly employed for representing the semantics of code snippets. However, the same structure of these features does not necessarily denote the same semantics of code snippets, and vice versa. In addition, these techniques utilize multiple different word mapping functions that map query words/code tokens to embedding representations. This causes diverged embeddings of the same word/token in queries and code snippets. We propose a novel context-aware code translation technique that translates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
