Loading paper
Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks | Tomesphere