Loading paper
What is the Visual Cognition Gap between Humans and Multimodal LLMs? | Tomesphere