Loading paper
Vision and Structured-Language Pretraining for Cross-Modal Food Retrieval | Tomesphere