Loading paper
TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning | Tomesphere