Loading paper
ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval | Tomesphere