Loading paper
Pix2Cap-COCO: Advancing Visual Comprehension via Pixel-Level Captioning | Tomesphere