MT4DP: Data Poisoning Attack Detection for DL-based Code Search Models via Metamorphic Testing

Gong Chen; Wenjie Liu; Xiaoyuan Xie; Xunzhu Tang; Tegawend\'e F. Bissyand\'e; Songqiang Chen

arXiv:2507.11092·cs.SE·July 16, 2025

MT4DP: Data Poisoning Attack Detection for DL-based Code Search Models via Metamorphic Testing

Gong Chen, Wenjie Liu, Xiaoyuan Xie, Xunzhu Tang, Tegawend\'e F. Bissyand\'e, Songqiang Chen

PDF

Open Access

TL;DR

This paper introduces MT4DP, a metamorphic testing-based framework that effectively detects data poisoning attacks in deep learning code search models, significantly outperforming existing methods.

Contribution

Proposes a novel SE-MR based detection framework for data poisoning in DL-based code search models, enhancing detection accuracy and robustness.

Findings

01

MT4DP outperforms baselines by 191% in F1 score

02

Achieves 265% improvement in average precision

03

Effectively detects malicious patterns in training data

Abstract

Recently, several studies have indicated that data poisoning attacks pose a severe security threat to deep learning-based (DL-based) code search models. Attackers inject carefully crafted malicious patterns into the training data, misleading the code search model to learn these patterns during training. During the usage of the poisoned code search model for inference, once the malicious pattern is triggered, the model tends to rank the vulnerability code higher. However, existing detection methods for data poisoning attacks on DL-based code search models remain insufficiently effective. To address this critical security issue, we propose MT4DP, a Data Poisoning Attack Detection Framework for DL-based Code Search Models via Metamorphic Testing. MT4DP introduces a novel Semantically Equivalent Metamorphic Relation (SE-MR) designed to detect data poisoning attacks on DL-based code search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Reliability and Analysis Research · Software Engineering Research