Loading paper
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization | Tomesphere