CodeMorph: Mitigating Data Leakage in Large Language Model Assessment
Hongzhou Rao, Yanjie Zhao, Wenjie Zhu, Ling Xiao, Meizhen Wang, and Haoyu Wang

TL;DR
CodeMorph is a novel approach that uses semantic-preserving transformations and a genetic algorithm to generate diverse, complex code variations across multiple languages, effectively reducing data leakage and providing more reliable LLM evaluation metrics.
Contribution
It introduces a multi-language, dependency-preserving perturbation framework with an adaptive selection algorithm to improve code variation diversity for LLM assessment.
Findings
LLM accuracy on code tasks decreased by 24.67% after applying CodeMorph.
PESO-optimized code has 7.01% lower similarity scores than random perturbations.
Significant reduction in similarity scores, up to 42.86%, demonstrating effective diversification.
Abstract
Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many training datasets make it difficult to prevent data leakage entirely, even with time lag strategies. Consequently, generating new datasets through code perturbation has become essential. However, existing methods often fail to produce complex and diverse variations, struggle with complex cross-file dependencies, and lack support for multiple programming languages, which limits their effectiveness in enhancing LLM evaluations for coding tasks. To fill this gap, we propose CodeMorph, an approach designed to support multiple programming languages while preserving cross-file dependencies to mitigate data leakage. CodeMorph consists of two main components that work together to enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Advanced Malware Detection Techniques
