Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual   Question Answering

Xingyu Fu; Ben Zhou; Sihao Chen; Mark Yatskar; Dan Roth

arXiv:2305.14882·cs.CL·April 16, 2024·2 cites

Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering

Xingyu Fu, Ben Zhou, Sihao Chen, Mark Yatskar, Dan Roth

PDF

Open Access

TL;DR

This paper introduces DCLUB, an inherently interpretable visual question answering model that provides human-readable visual clues as intermediate explanations, improving reasoning accuracy while maintaining high performance.

Contribution

The paper proposes the DCLUB model, which offers an interpretable VQA system with intermediate visual clues, and introduces a new dataset for training and evaluating explanations.

Findings

01

DCLUB improves reasoning accuracy by 4.64% over black-box models.

02

DCLUB maintains 99.43% of VQA-v2 performance.

03

The model provides human-readable visual clues as explanations.

Abstract

Recent advances in multimodal large language models (LLMs) have shown extreme effectiveness in visual question answering (VQA). However, the design nature of these end-to-end models prevents them from being interpretable to humans, undermining trust and applicability in critical domains. While post-hoc rationales offer certain insight into understanding model behavior, these explanations are not guaranteed to be faithful to the model. In this paper, we address these shortcomings by introducing an interpretable by design model that factors model decisions into intermediate human-legible explanations, and allows people to easily understand why a model fails or succeeds. We propose the Dynamic Clue Bottleneck Model ( (DCLUB), a method that is designed towards an inherently interpretable VQA system. DCLUB provides an explainable intermediate space before the VQA decision and is faithful…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning

MethodsFocus