Transparency by Design: Closing the Gap Between Performance and   Interpretability in Visual Reasoning

David Mascharka; Philip Tran; Ryan Soklaski; Arjun Majumdar

arXiv:1803.05268·cs.CV·January 24, 2019

Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning

David Mascharka, Philip Tran, Ryan Soklaski, Arjun Majumdar

PDF

1 Repo

TL;DR

This paper introduces visual reasoning primitives that enhance interpretability and achieve state-of-the-art accuracy in visual question answering, bridging the gap between model transparency and high performance.

Contribution

The authors propose a set of composable visual reasoning primitives that improve interpretability while maintaining high accuracy in complex visual reasoning tasks.

Findings

01

Achieved 99.1% accuracy on CLEVR dataset.

02

Significantly improved generalization on CoGenT with over 20 percentage points.

03

Enabled diagnosis of model strengths and weaknesses through primitive outputs.

Abstract

Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for performing visual reasoning tasks. While modular networks were initially designed with a degree of model transparency, their performance on complex visual reasoning benchmarks was lacking. Current state-of-the-art approaches do not provide an effective mechanism for understanding the reasoning process. In this paper, we close the performance gap between interpretable models and state-of-the-art visual reasoning methods. We propose a set of visual-reasoning primitives which, when composed, manifest as a model capable of performing complex reasoning tasks in an explicitly-interpretable manner. The fidelity and interpretability of the primitives' outputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidmascharka/tbd-nets
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability