# Inferring and Executing Programs for Visual Reasoning

**Authors:** Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy, Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick

arXiv: 1705.03633 · 2017-05-11

## TL;DR

This paper introduces a neural network-based model that explicitly constructs and executes reasoning programs for visual reasoning tasks, outperforming traditional black-box approaches on the CLEVR benchmark.

## Contribution

It presents a novel model combining a program generator and execution engine, explicitly modeling reasoning processes for improved visual reasoning performance.

## Key findings

- Significantly outperforms baseline models on CLEVR
- Generalizes better across different settings
- Effectively learns explicit reasoning programs

## Abstract

Existing methods for visual reasoning attempt to directly map inputs to outputs using black-box architectures without explicitly modeling the underlying reasoning processes. As a result, these black-box models often learn to exploit biases in the data rather than learning to perform visual reasoning. Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer. Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE. Using the CLEVR benchmark for visual reasoning, we show that our model significantly outperforms strong baselines and generalizes better in a variety of settings.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.03633/full.md

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/1705.03633/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1705.03633/full.md

---
Source: https://tomesphere.com/paper/1705.03633