Loading paper
CARV: A Diagnostic Benchmark for Compositional Analogical Reasoning in Multimodal LLMs | Tomesphere