SITUATE -- Synthetic Object Counting Dataset for VLM training

Ren\'e Peinl; Vincent Tischler; Patrick Schr\"oder; Christian Groth

arXiv:2602.00108·cs.CV·February 3, 2026

SITUATE -- Synthetic Object Counting Dataset for VLM training

Ren\'e Peinl, Vincent Tischler, Patrick Schr\"oder, Christian Groth

PDF

Open Access

TL;DR

SITUATE is a new dataset for training and evaluating vision-language models on counting tasks with spatial constraints, improving generalization to out-of-distribution images.

Contribution

The paper introduces SITUATE, a novel dataset that enhances counting model training by providing controlled spatial and occlusion information.

Findings

01

Finetuning Qwen VL 2.5 7B on SITUATE improves accuracy on Pixmo count test data.

02

SITUATE helps models generalize better to out-of-distribution images.

03

The dataset bridges the gap between simple 2D datasets and ambiguous real-world datasets.

Abstract

We present SITUATE, a novel dataset designed for training and evaluating Vision Language Models on counting tasks with spatial constraints. The dataset bridges the gap between simple 2D datasets like VLMCountBench and often ambiguous real-life datasets like TallyQA, which lack control over occlusions and spatial composition. Experiments show that our dataset helps to improve generalization for out-of-distribution images, since a finetune of Qwen VL 2.5 7B on SITUATE improves accuracy on the Pixmo count test data, but not vice versa. We cross validate this by comparing the model performance across established other counting benchmarks and against an equally sized fine-tuning set derived from Pixmo count.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications