Loading paper
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training | Tomesphere