VQA-Levels: A Hierarchical Approach for Classifying Questions in VQA
Madhuri Latha Madaka, Chakravarthy Bhagvati

TL;DR
This paper introduces VQA-Levels, a hierarchical benchmark dataset for classifying VQA questions into seven levels based on complexity, to systematically evaluate and improve VQA systems.
Contribution
The paper presents a novel hierarchical classification scheme and a new benchmark dataset for analyzing VQA system performance across question complexity levels.
Findings
High success of existing systems on low-level questions
Lower performance on high-level abstraction questions
VQA-Levels enables systematic analysis of VQA capabilities
Abstract
Designing datasets for Visual Question Answering (VQA) is a difficult and complex task that requires NLP for parsing and computer vision for analysing the relevant aspects of the image for answering the question asked. Several benchmark datasets have been developed by researchers but there are many issues with using them for methodical performance tests. This paper proposes a new benchmark dataset -- a pilot version called VQA-Levels is ready now -- for testing VQA systems systematically and assisting researchers in advancing the field. The questions are classified into seven levels ranging from direct answers based on low-level image features (without needing even a classifier) to those requiring high-level abstraction of the entire image content. The questions in the dataset exhibit one or many of ten properties. Each is categorised into a specific level from 1 to 7. Levels 1 - 3 are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis
