CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao

TL;DR
CodeHacker is an automated framework that generates targeted adversarial test cases to uncover vulnerabilities in competitive programming solutions, improving the robustness of code evaluation benchmarks and training data.
Contribution
It introduces a multi-strategy adversarial test case generation framework with a self-refinement calibration phase, enhancing vulnerability detection and training data quality.
Findings
Significantly increases the True Negative Rate of existing datasets.
Generates superior adversarial cases for training reinforcement learning models.
Effectively filters out incorrect solutions that previously passed tests.
Abstract
The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant code.Experiments demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research
