Loading paper
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning | Tomesphere