Cross-Language Code Search using Static and Dynamic Analyses
George Mathew, Kathryn T. Stolee

TL;DR
This paper introduces COSAL, a cross-language code search method combining static and dynamic analyses without machine learning, improving accuracy over existing tools for multi-language code retrieval.
Contribution
COSAL is a novel cross-language code search technique that uses combined static and dynamic analyses with non-dominated sorting, eliminating the need for labeled training data.
Findings
COSAL outperforms existing tools in precision and recall.
Non-dominated ranking of static and dynamic similarities improves search effectiveness.
Effective on large datasets of Java and Python code.
Abstract
As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
