Loading paper
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining | Tomesphere