TriFusion-LLM: Prior-Guided Multimodal Fusion with LLM Arbitration for Fine-grained Code Clone Detection
Mengdi Li, Yuming Liu, He Wang, Zifeng Xu, Yuqing Zhang

TL;DR
This paper introduces TriFusion-LLM, a multimodal fusion framework that combines heuristic priors, structural signals, and semantic embeddings with LLM arbitration to improve fine-grained code clone detection accuracy efficiently.
Contribution
It presents a novel fusion framework that integrates multiple representations and uses LLM arbitration to enhance fine-grained code clone detection beyond binary classification.
Findings
Macro-F1 score increased from 0.695 to 0.875 on BigCloneBench.
Selective arbitration of 0.2% high-uncertainty samples improves Macro-F1 by 0.3.
Fusion of structural, statistical, and semantic signals enhances clone type discrimination.
Abstract
Code clone detection (CCD) supports software maintenance, refactoring, and security analysis. Although pre-trained models capture code semantics, most work reduces CCD to binary classification, overlooking the heterogeneity of clone types and the seven fine-grained categories in BigCloneBench. We present Full Model, a multimodal fusion framework that jointly integrates heuristic similarity priors from classical machine learning, structural signals from abstract syntax trees (ASTs), and deep semantic embeddings from CodeBERT into a single predictor. By fusing structural, statistical, and semantic representations, Full Model improves discrimination among fine-grained clone types while keeping inference cost practical. On the seven-class BigCloneBench benchmark, Full Model raises Macro-F1 from 0.695 to 0.875. Ablation studies show that using the primary model's probability distribution as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Software Testing and Debugging Techniques
