Loading paper
Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval | Tomesphere