Loading paper
HarmoCLIP: Harmonizing Global and Regional Representations in Contrastive Vision-Language Models | Tomesphere