Loading paper
MazeEval: A Benchmark for Testing Sequential Decision-Making in Language Models | Tomesphere