Context-VQA: Towards Context-Aware and Purposeful Visual Question   Answering

Nandita Naik; Christopher Potts; Elisa Kreiss

arXiv:2307.15745·cs.CL·August 31, 2023

Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

Nandita Naik, Christopher Potts, Elisa Kreiss

PDF

Open Access 1 Repo

TL;DR

This paper introduces Context-VQA, a new dataset that pairs images with contextual website information, demonstrating that context significantly influences question types and emphasizing the need for context-aware VQA models, especially for accessibility.

Contribution

The paper presents Context-VQA, a dataset that incorporates context into VQA, highlighting the importance of context-awareness for improving accessibility and question relevance.

Findings

01

Question types vary systematically across different contexts.

02

Context effects are more pronounced when images are not visible.

03

Models should incorporate context to better serve accessibility needs.

Abstract

Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA, a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nnaik39/context-vqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsEmirates Airlines Office in Dubai · Focus