VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks
Shailaja Keyur Sampat, Mutsumi Nakamura, Shankar Kailas and, Kartik Aggarwal, Mandy Zhou, Yezhou Yang, Chitta Baral

TL;DR
VL-GLUE is a comprehensive benchmark with over 100,000 samples across seven tasks, designed to evaluate and advance visuo-linguistic reasoning in AI systems by challenging current models with diverse, real-world multimodal data.
Contribution
This paper introduces VL-GLUE, a new large-scale benchmark for visuo-linguistic reasoning, highlighting its diversity and difficulty for existing models.
Findings
Current models struggle with VL-GLUE tasks.
VL-GLUE covers diverse image types and domain-specific texts.
Benchmark encourages development of more robust visuo-linguistic AI systems.
Abstract
Deriving inference from heterogeneous inputs (such as images, text, and audio) is an important skill for humans to perform day-to-day tasks. A similar ability is desirable for the development of advanced Artificial Intelligence (AI) systems. While state-of-the-art models are rapidly closing the gap with human-level performance on diverse computer vision and NLP tasks separately, they struggle to solve tasks that require joint reasoning over visual and textual modalities. Inspired by GLUE (Wang et. al., 2018)- a multitask benchmark for natural language understanding, we propose VL-GLUE in this paper. VL-GLUE consists of over 100k samples spanned across seven different tasks, which at their core require visuo-linguistic reasoning. Moreover, our benchmark comprises of diverse image types (from synthetically rendered figures, and day-to-day scenes to charts and complex diagrams) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization
