Loading paper
Reward Hacking Benchmark: Measuring Exploits in LLM Agents with Tool Use | Tomesphere