Development and Benchmarking of Multilingual Code Clone Detector
Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya, Eunjong Choi, Hiroaki, Takada

TL;DR
This paper introduces MSCCD, a multilingual code clone detector supporting many languages and Type-3 clone detection, evaluated against existing tools and complemented by a new multilingual benchmark based on CodeNet.
Contribution
The paper presents a novel multilingual code clone detector using ANTLR parsing, supporting more languages and Type-3 clones, along with a benchmark for multilingual detection performance.
Findings
MSCCD supports the most languages among current detectors.
Detection performance varies significantly across different programming languages.
MSCCD offers a balanced trade-off between detection accuracy and language extensibility.
Abstract
The diversity of programming languages is growing, making the language extensibility of code clone detectors crucial. However, this is challenging for most existing clone detection detectors because the source code handler needs modifications, which require specialist-level knowledge of the targeted language and is time-consuming. Multilingual code clone detectors make it easier to add new language support by providing syntax information of the target language only. To address the shortcomings of existing multilingual detectors for language scalability and detection performance, we propose a multilingual code block extraction method based on ANTLR parser generation, and implement a multilingual code clone detector (MSCCD), which supports the most significant number of languages currently available and has the ability to detect Type-3 code clones. We follow the methodology of previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
