Toloka Visual Question Answering Benchmark

Dmitry Ustalov; Nikita Pavlichenko; Sergey Koshelev; Daniil; Likhobaba; Alisa Smirnova

arXiv:2309.16511·cs.CV·September 29, 2023·1 cites

Toloka Visual Question Answering Benchmark

Dmitry Ustalov, Nikita Pavlichenko, Sergey Koshelev, Daniil, Likhobaba, Alisa Smirnova

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces Toloka Visual Question Answering, a large crowdsourced dataset for grounding visual question answering, enabling comparison of machine learning models against human performance, and reports that current models lag behind non-expert crowdsourcing.

Contribution

The paper presents a new dataset for grounding visual question answering and evaluates baseline models, highlighting the gap between machine and human performance.

Findings

01

No machine learning model outperformed non-expert crowdsourcing.

02

The dataset contains 45,199 image-question pairs with ground truth bounding boxes.

03

A multi-phase competition attracted 48 participants worldwide.

Abstract

In this paper, we present Toloka Visual Question Answering, a new crowdsourced dataset allowing comparing performance of machine learning systems against human level of expertise in the grounding visual question answering task. In this task, given an image and a textual question, one has to draw the bounding box around the object correctly responding to that question. Every image-question pair contains the response, with only one correct response per image. Our dataset contains 45,199 pairs of images and questions in English, provided with ground truth bounding boxes, split into train and two test subsets. Besides describing the dataset and releasing it under a CC BY license, we conducted a series of experiments on open source zero-shot baseline models and organized a multi-phase competition at WSDM Cup that attracted 48 participants worldwide. However, by the time of paper submission,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

toloka/wsdmcup2023
pytorchOfficial

Datasets

toloka/WSDMCup2023
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques