Loading paper
Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders | Tomesphere