Loading paper
XGPT: Cross-modal Generative Pre-Training for Image Captioning | Tomesphere