Loading paper
Predicting When to Trust Vision-Language Models for Spatial Reasoning | Tomesphere