MSCCD: Grammar Pluggable Clone Detection Based on ANTLR Parser Generation
Wenqing Zhu, Norihiro Yoshida, Toshihiro Kamiya, Eunjong Choi, Hiroaki, Takada

TL;DR
MSCCD is a grammar pluggable, multilingual code clone detection tool that uses parser generation to support various languages and detect different clone types, achieving state-of-the-art performance.
Contribution
It introduces a flexible, parser-based approach for multilingual clone detection that can easily extend to new languages using ANTLR parser generation.
Findings
Supported 16 out of 20 modern languages with perfect detection.
Achieved clone detection performance comparable to state-of-the-art tools.
Successfully detected Type-3 clones at various granularities.
Abstract
For various reasons, programming languages continue to multiply and evolve. It has become necessary to have a multilingual clone detection tool that can easily expand supported programming languages and detect various code clones is needed. However, research on multilingual code clone detection has not received sufficient attention. In this study, we propose MSCCD (Multilingual Syntactic Code Clone Detector), a grammar pluggable code clone detection tool that uses a parser generator to generate a code block extractor for the target language. The extractor then extracts the semantic code blocks from a parse tree. MSCCD can detect Type-3 clones at various granularities. We evaluated MSCCD's language extensibility by applying MSCCD to 20 modern languages. Sixteen languages were perfectly supported, and the remaining four were provided with the same detection capabilities at the expense of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
