Loading paper
CREPE: Can Vision-Language Foundation Models Reason Compositionally? | Tomesphere