Loading paper
DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning | Tomesphere