Loading paper
Vision-language models lag human performance on physical dynamics and intent reasoning | Tomesphere