Loading paper
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies? | Tomesphere