TallyQA: Answering Complex Counting Questions

Manoj Acharya; Kushal Kafle; Christopher Kanan

arXiv:1810.12440·cs.CV·November 2, 2018·5 cites

TallyQA: Answering Complex Counting Questions

Manoj Acharya, Kushal Kafle, Christopher Kanan

PDF

Open Access 1 Repo 10 Models 1 Datasets

TL;DR

This paper introduces TallyQA, a large dataset for complex counting in visual question answering, and proposes a relation network-based algorithm that achieves state-of-the-art results on multiple benchmarks.

Contribution

The paper presents TallyQA, the largest dataset for complex counting questions, and a novel relation network-based algorithm optimized for high-resolution images.

Findings

01

Achieved state-of-the-art results on TallyQA and HowMany-QA benchmarks.

02

Demonstrated the effectiveness of relation networks with region proposals for complex counting.

03

Showed improved performance over baseline systems on complex counting tasks.

Abstract

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manoja328/tallyqacode
pytorchOfficial

Models

Datasets

comfyuistudio/gm
dataset· 308 dl
308 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning