Loading paper
HiVLP: Hierarchical Vision-Language Pre-Training for Fast Image-Text Retrieval | Tomesphere