ChiQA: A Large Scale Image-based Real-World Question Answering Dataset   for Multi-Modal Understanding

Bingning Wang; Feiyang Lv; Ting Yao; Yiming Yuan; Jin Ma; Yu Luo and; Haijin Liang

arXiv:2208.03030·cs.CL·August 8, 2022

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Bingning Wang, Feiyang Lv, Ting Yao, Yiming Yuan, Jin Ma, Yu Luo and, Haijin Liang

PDF

Open Access 1 Repo

TL;DR

ChiQA introduces a large-scale, real-world question answering dataset that emphasizes unbiased, diverse queries requiring deep multi-modal reasoning, highlighting current model limitations.

Contribution

The paper presents ChiQA, a new dataset with real-world, unbiased questions and a focus on answerability, advancing multi-modal understanding beyond existing VQA datasets.

Findings

01

Existing models perform poorly on ChiQA, indicating room for improvement.

02

ChiQA's questions require complex reasoning and grounding.

03

The dataset reveals limitations in current visual-language models.

Abstract

Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as `What color are her eyes?'. The human generated crowdsourcing questions are relatively simple and sometimes have the bias toward certain entities or attributes. In this paper, we introduce a new question answering dataset based on image-ChiQA. It contains the real-world queries issued by internet users, combined with several related open-domain images. The system should determine whether the image could answer the question or not. Different from previous VQA datasets, the questions are real-world image-independent queries that are more various and unbiased. Compared with previous image-retrieval or image-caption datasets, the ChiQA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

benywon/ChiQA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsALBEF