ViUniT: Visual Unit Tests for More Robust Visual Programming

Artemis Panagopoulou; Honglu Zhou; Silvio Savarese; Caiming Xiong,; Chris Callison-Burch; Mark Yatskar; Juan Carlos Niebles

arXiv:2412.08859·cs.CV·December 13, 2024

ViUniT: Visual Unit Tests for More Robust Visual Programming

Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong,, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

PDF

Open Access

TL;DR

ViUniT introduces an innovative framework for automatically generating visual unit tests to enhance the reliability and correctness of visual reasoning models, significantly reducing errors and improving performance.

Contribution

The paper presents a novel method for creating visual unit tests using language models and image synthesis, improving model robustness and enabling new applications in visual reasoning tasks.

Findings

01

Improves model performance by 11.4% across datasets

02

Enables open-source models to outperform GPT-4o-mini by 7.7%

03

Reduces incorrect program reasoning by 40%

Abstract

Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models answer correctly, they produce incorrect programs 33% of the time. These models are often right for the wrong reasons and risk unexpected failures on new data. Unit tests play a foundational role in ensuring code correctness and could be used to repair such failures. We propose Visual Unit Testing (ViUniT), a framework to improve the reliability of visual programs by automatically generating unit tests. In our framework, a unit test is represented as a novel image and answer pair meant to verify the logical correctness of a program produced for a given query. Our method leverages a language model to create unit tests in the form of image descriptions and expected answers and image synthesis to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming