Loading paper
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation | Tomesphere