Loading paper
$\beta$-CLIP: Text-Conditioned Contrastive Learning for Multi-Granular Vision-Language Alignment | Tomesphere