Loading paper
Vision-Language Models Create Cross-Modal Task Representations | Tomesphere