Loading paper
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping | Tomesphere