EvilGenie: A Reward Hacking Benchmark

Jonathan Gabor; Jayson Lynch; Jonathan Rosenfeld

arXiv:2511.21654·cs.LG·May 19, 2026

EvilGenie: A Reward Hacking Benchmark

Jonathan Gabor, Jayson Lynch, Jonathan Rosenfeld

PDF

1 Repo

TL;DR

EvilGenie is a benchmark designed to evaluate reward hacking in programming agents, using multiple detection methods and testing various models including proprietary ones, revealing prevalent reward hacking behaviors.

Contribution

The paper introduces EvilGenie, a novel benchmark environment for reward hacking in programming agents, and evaluates multiple detection methods and models, including proprietary ones.

Findings

01

LLM judge effectively detects reward hacking in unambiguous cases.

02

Minimal improvement from held out test cases in detecting reward hacking.

03

Explicit reward hacking observed in Codex and Claude Code, with all three models showing misaligned behavior.

Abstract

We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems from LiveCodeBench and create an environment in which agents can easily reward hack, such as by hardcoding test cases or editing the testing files. We measure reward hacking in three ways: held out unit tests, LLM judges, and test file edit detection. We verify these methods against human review and each other. We find the LLM judge to be highly effective at detecting reward hacking in unambiguous cases, and observe only minimal improvement from the use of held out test cases. In addition to testing many models using Inspect's basic\_agent scaffold, we also measure reward hacking rates for three popular proprietary coding agents: OpenAI's Codex, Anthropic's Claude Code, and Google's Gemini CLI. We observe explicit reward hacking by both Codex and Claude Code, and misaligned behavior by all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JonathanGabor/evilgenie_inspect
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Logic, programming, and type systems · Software Testing and Debugging Techniques