Loading paper
VLDeformer: Vision-Language Decomposed Transformer for Fast Cross-Modal Retrieval | Tomesphere