Loading paper
The Self-Execution Benchmark: Measuring LLMs' Attempts to Overcome Their Lack of Self-Execution | Tomesphere