Loading paper
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers | Tomesphere