TallyQA: Answering Complex Counting Questions
Manoj Acharya, Kushal Kafle, Christopher Kanan

TL;DR
This paper introduces TallyQA, a large dataset for complex counting in visual question answering, and proposes a relation network-based algorithm that achieves state-of-the-art results on multiple benchmarks.
Contribution
The paper presents TallyQA, the largest dataset for complex counting questions, and a novel relation network-based algorithm optimized for high-resolution images.
Findings
Achieved state-of-the-art results on TallyQA and HowMany-QA benchmarks.
Demonstrated the effectiveness of relation networks with region proposals for complex counting.
Showed improved performance over baseline systems on complex counting tasks.
Abstract
Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resolution imagery. It yields state-of-the-art results compared to baseline and recent systems on both TallyQA and the HowMany-QA benchmark.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/gemma-3-4b-itmodel· 1.5M dl· ♡ 12721.5M dl♡ 1272
- 🤗google/gemma-3-27b-itmodel· 1.0M dl· ♡ 19401.0M dl♡ 1940
- 🤗unsloth/gemma-3-12b-it-GGUFmodel· 101k dl· ♡ 178101k dl♡ 178
- 🤗google/gemma-3-1b-itmodel· 1.4M dl· ♡ 8991.4M dl♡ 899
- 🤗google/gemma-3-12b-it-qat-q4_0-ggufmodel· 7.1k dl· ♡ 2627.1k dl♡ 262
- 🤗google/gemma-3-270mmodel· 83k dl· ♡ 100383k dl♡ 1003
- 🤗google/gemma-3-12b-itmodel· 2.6M dl· ♡ 6982.6M dl♡ 698
- 🤗google/gemma-3-12b-it-qat-q4_0-unquantizedmodel· 28k dl· ♡ 8128k dl♡ 81
- 🤗p-e-w/gemma-3-12b-it-hereticmodel· 2.4k dl· ♡ 792.4k dl♡ 79
- 🤗llmfan46/gemma-3-12b-it-ultra-uncensored-heretic-GGUFmodel· 23k dl· ♡ 1323k dl♡ 13
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
