Loading paper
CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding | Tomesphere