Selective Vision is the Challenge for Visual Reasoning: A Benchmark for   Visual Argument Understanding

Jiwan Chung; Sungjae Lee; Minseo Kim; Seungju Han; Ashkan Yousefpour,; Jack Hessel; Youngjae Yu

arXiv:2406.18925·cs.CL·October 24, 2024

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour,, Jack Hessel, Youngjae Yu

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces VisArgs, a dataset and benchmark for evaluating AI's ability to understand visual arguments, emphasizing the challenge of selective vision in interpreting images within argumentative contexts.

Contribution

The paper presents a new dataset, VisArgs, with annotated visual and commonsense premises, and proposes three tasks to assess AI understanding of visual arguments, highlighting current model limitations.

Findings

01

AI models struggle with visual premise localization and identification.

02

Providing relevant visual premises improves model accuracy.

03

Humans outperform AI in understanding visual arguments.

Abstract

Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today's AI capable of similar understanding? We present VisArgs, a dataset of 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion deduction. Experiments show that 1) machines struggle to capture visual cues: GPT-4-O achieved 78.5%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiwanchung/visargs
pytorchOfficial

Datasets

jiwan-chung/visargs
dataset· 14 dl
14 dl

Videos

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding· underline

Taxonomy

TopicsLanguage, Metaphor, and Cognition