Loading paper
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation | Tomesphere