Loading paper
Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features | Tomesphere