Loading paper
SLIP: Structural-aware Language-Image Pretraining for Vision-Language Alignment | Tomesphere