CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

Jingwei Shi; Xinxiang Yin; Jing Huang; Jinman Zhao; Shengyu Tao

arXiv:2602.20213·cs.SE·February 25, 2026

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao

PDF

Open Access

TL;DR

CodeHacker is an automated framework that generates targeted adversarial test cases to uncover vulnerabilities in competitive programming solutions, improving the robustness of code evaluation benchmarks and training data.

Contribution

It introduces a multi-strategy adversarial test case generation framework with a self-refinement calibration phase, enhancing vulnerability detection and training data quality.

Findings

01

Significantly increases the True Negative Rate of existing datasets.

02

Generates superior adversarial cases for training reinforcement learning models.

03

Effectively filters out incorrect solutions that previously passed tests.

Abstract

The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant code.Experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Adversarial Robustness in Machine Learning · Software Engineering Research