Loading paper
ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense | Tomesphere