Loading paper
ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data | Tomesphere