Loading paper
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models | Tomesphere