TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models

Daniel Nobrega Medeiros

arXiv:2603.00206·cs.CV·March 3, 2026

TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models

Daniel Nobrega Medeiros

PDF

Open Access 1 Datasets

TL;DR

The TACIT Benchmark introduces a comprehensive visual reasoning platform with diverse tasks and dual evaluation modes, enabling robust assessment of generative and discriminative models in structured visual reasoning.

Contribution

It presents a novel, programmatic benchmark with multiple reasoning domains, deterministic puzzle generation, and dual-track evaluation for visual reasoning models.

Findings

01

Provides 6,000 puzzles across 3 resolutions

02

Dual evaluation tracks for generative and discriminative models

03

Reproducible and fully deterministic puzzle generation

Abstract

Existing visual reasoning benchmarks predominantly rely on natural language prompts, evaluate narrow reasoning modalities, or depend on subjective scoring procedures such as LLM-as-judge. We introduce the TACIT Benchmark, a programmatic visual reasoning benchmark comprising 10 tasks across 6 reasoning domains: spatial navigation, abstract pattern completion, causal simulation, logical constraint satisfaction, graph theory, and topology. The benchmark provides dual-track evaluation: a generative track in which models must produce solution images verified through deterministic computer-vision pipelines, and a discriminative track offering five-way multiple choice with structurally plausible near-miss distractors. Each distractor violates exactly one structural constraint, requiring models to reason about fine-grained visual differences rather than exploit superficial cues. Version 0.1.0…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

tylerxdurden/TACIT-benchmark
dataset· 45k dl
45k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis