Loading paper
Training Vision-Language Transformers from Captions | Tomesphere