IKIWISI: An Interactive Visual Pattern Generator for Evaluating the Reliability of Vision-Language Models Without Ground Truth
Md Touhidul Islam, Imran Kabir, Md Alimoor Reza, Syed Masum Billah

TL;DR
IKIWISI is an interactive visualization tool that helps evaluate vision-language models' reliability in video object recognition without ground truth by leveraging human pattern recognition and adversarial 'spy objects' to identify model errors.
Contribution
The paper introduces IKIWISI, a novel interactive visual pattern generator that enables human-in-the-loop evaluation of vision-language models without ground truth, using heatmaps and adversarial objects.
Findings
Users found IKIWISI easy to use.
Assessments correlated with objective metrics.
Effective with limited heatmap exploration.
Abstract
We present IKIWISI ("I Know It When I See It"), an interactive visual pattern generator for assessing vision-language models in video object recognition when ground truth is unavailable. IKIWISI transforms model outputs into a binary heatmap where green cells indicate object presence and red cells indicate object absence. This visualization leverages humans' innate pattern recognition abilities to evaluate model reliability. IKIWISI introduces "spy objects": adversarial instances users know are absent, to discern models hallucinating on nonexistent items. The tool functions as a cognitive audit mechanism, surfacing mismatches between human and machine perception by visualizing where models diverge from human understanding. Our study with 15 participants found that users considered IKIWISI easy to use, made assessments that correlated with objective metrics when available, and reached…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
