IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth

Md Touhidul Islam; Imran Kabir; Md Alimoor Reza; Syed Masum Billah

arXiv:2505.22305·cs.CV·May 29, 2025

IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth

Md Touhidul Islam, Imran Kabir, Md Alimoor Reza, Syed Masum Billah

PDF

TL;DR

IKIWISI is an interactive visualization tool that helps evaluate vision-language models' reliability in video object recognition without ground truth by leveraging human pattern recognition and adversarial 'spy objects' to identify model errors.

Contribution

The paper introduces IKIWISI, a novel interactive visual pattern generator that enables human-in-the-loop evaluation of vision-language models without ground truth, using heatmaps and adversarial objects.

Findings

01

Users found IKIWISI easy to use.

02

Assessments correlated with objective metrics.

03

Effective with limited heatmap exploration.

Abstract

We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to evaluate model reliability. IKIWISI introduces "spy objects": adversarial instances users know are absent, to discern models hallucinating on nonexistent items. The tool functions as a cognitive audit mechanism, surfacing mismatches between human and machine perception by visualizing where models diverge from human understanding. Our study with 15 participants found that users considered IKIWISI easy to use, made assessments that correlated with objective metrics when available, and reached…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.