ViUniT: Visual Unit Tests for More Robust Visual Programming
Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong,, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

TL;DR
ViUniT introduces an innovative framework for automatically generating visual unit tests to enhance the reliability and correctness of visual reasoning models, significantly reducing errors and improving performance.
Contribution
The paper presents a novel method for creating visual unit tests using language models and image synthesis, improving model robustness and enabling new applications in visual reasoning tasks.
Findings
Improves model performance by 11.4% across datasets
Enables open-source models to outperform GPT-4o-mini by 7.7%
Reduces incorrect program reasoning by 40%
Abstract
Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the time. These models are often right for the wrong reasons and risk unexpected failures on new data. Unit tests play a foundational role in ensuring code correctness and could be used to repair such failures. We propose Visual Unit Testing (ViUniT), a framework to improve the reliability of visual programs by automatically generating unit tests. In our framework, a unit test is represented as a novel image and answer pair meant to verify the logical correctness of a program produced for a given query. Our method leverages a language model to create unit tests in the form of image descriptions and expected answers and image synthesis to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming
