Loading paper
Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval | Tomesphere