Loading paper
How Much Can CLIP Benefit Vision-and-Language Tasks? | Tomesphere