Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
Nandita Naik, Christopher Potts, Elisa Kreiss

TL;DR
This paper introduces Context-VQA, a new dataset that pairs images with contextual website information, demonstrating that context significantly influences question types and emphasizing the need for context-aware VQA models, especially for accessibility.
Contribution
The paper presents Context-VQA, a dataset that incorporates context into VQA, highlighting the importance of context-awareness for improving accessibility and question relevance.
Findings
Question types vary systematically across different contexts.
Context effects are more pronounced when images are not visible.
Models should incorporate context to better serve accessibility needs.
Abstract
Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA, a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsEmirates Airlines Office in Dubai · Focus
