Loading paper
COPA: Efficient Vision-Language Pre-training Through Collaborative Object- and Patch-Text Alignment | Tomesphere