Doubly Right Object Recognition: A Why Prompt for Visual Rationales

Chengzhi Mao; Revant Teotia; Amrutha Sundar; Sachit Menon; Junfeng; Yang; Xin Wang; Carl Vondrick

arXiv:2212.06202·cs.CV·March 27, 2023·1 cites

Doubly Right Object Recognition: A Why Prompt for Visual Rationales

Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng, Yang, Xin Wang, Carl Vondrick

PDF

Open Access 1 Repo

TL;DR

This paper introduces a benchmark for evaluating whether visual recognition models can produce correct rationales alongside their predictions, and proposes a method to improve rationale accuracy through a 'why prompt' that transfers language model rationales to visual models.

Contribution

The paper presents the 'doubly right' object recognition benchmark and a novel 'why prompt' method that enhances visual models' ability to generate correct rationales, improving interpretability.

Findings

01

State-of-the-art models often produce incorrect rationales.

02

Transferring language model rationales improves visual model explanations.

03

The 'why prompt' enhances zero-shot transfer to unseen tasks.

Abstract

Many visual recognition models are evaluated only on their classification accuracy, a metric for which they obtain strong performance. In this paper, we investigate whether computer vision models can also provide correct rationales for their predictions. We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales. We find that state-of-the-art visual models, such as CLIP, often provide incorrect rationales for their categorical predictions. However, by transferring the rationales from language models into visual representations through a tailored dataset, we show that we can learn a ``why prompt,'' which adapts large visual representations to produce correct rationales. Visualizations and empirical experiments show that our prompts significantly improve performance on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvlab-columbia/doubleright
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training