VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic   Reasoning Tasks

Shailaja Keyur Sampat; Mutsumi Nakamura; Shankar Kailas and; Kartik Aggarwal; Mandy Zhou; Yezhou Yang; Chitta Baral

arXiv:2410.13666·cs.CV·October 18, 2024

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks

Shailaja Keyur Sampat, Mutsumi Nakamura, Shankar Kailas and, Kartik Aggarwal, Mandy Zhou, Yezhou Yang, Chitta Baral

PDF

Open Access 1 Repo

TL;DR

VL-GLUE is a comprehensive benchmark with over 100,000 samples across seven tasks, designed to evaluate and advance visuo-linguistic reasoning in AI systems by challenging current models with diverse, real-world multimodal data.

Contribution

This paper introduces VL-GLUE, a new large-scale benchmark for visuo-linguistic reasoning, highlighting its diversity and difficulty for existing models.

Findings

01

Current models struggle with VL-GLUE tasks.

02

VL-GLUE covers diverse image types and domain-specific texts.

03

Benchmark encourages development of more robust visuo-linguistic AI systems.

Abstract

Deriving inference from heterogeneous inputs (such as images, text, and audio) is an important skill for humans to perform day-to-day tasks. A similar ability is desirable for the development of advanced Artificial Intelligence (AI) systems. While state-of-the-art models are rapidly closing the gap with human-level performance on diverse computer vision and NLP tasks separately, they struggle to solve tasks that require joint reasoning over visual and textual modalities. Inspired by GLUE (Wang et. al., 2018)- a multitask benchmark for natural language understanding, we propose VL-GLUE in this paper. VL-GLUE consists of over 100k samples spanned across seven different tasks, which at their core require visuo-linguistic reasoning. Moreover, our benchmark comprises of diverse image types (from synthetically rendered figures, and day-to-day scenes to charts and complex diagrams) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shailaja183/vl-glue
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization