Loading paper
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents | Tomesphere