Loading paper
Do Pre-trained Vision-Language Models Encode Object States? | Tomesphere