TACIT Benchmark: A Programmatic Visual Reasoning Benchmark for Generative and Discriminative Models
Daniel Nobrega Medeiros

TL;DR
The TACIT Benchmark introduces a comprehensive visual reasoning platform with diverse tasks and dual evaluation modes, enabling robust assessment of generative and discriminative models in structured visual reasoning.
Contribution
It presents a novel, programmatic benchmark with multiple reasoning domains, deterministic puzzle generation, and dual-track evaluation for visual reasoning models.
Findings
Provides 6,000 puzzles across 3 resolutions
Dual evaluation tracks for generative and discriminative models
Reproducible and fully deterministic puzzle generation
Abstract
Existing visual reasoning benchmarks predominantly rely on natural language prompts, evaluate narrow reasoning modalities, or depend on subjective scoring procedures such as LLM-as-judge. We introduce the TACIT Benchmark, a programmatic visual reasoning benchmark comprising 10 tasks across 6 reasoning domains: spatial navigation, abstract pattern completion, causal simulation, logical constraint satisfaction, graph theory, and topology. The benchmark provides dual-track evaluation: a generative track in which models must produce solution images verified through deterministic computer-vision pipelines, and a discriminative track offering five-way multiple choice with structurally plausible near-miss distractors. Each distractor violates exactly one structural constraint, requiring models to reason about fine-grained visual differences rather than exploit superficial cues. Version 0.1.0…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis
