Loading paper
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | Tomesphere