# V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive   Matrices

**Authors:** Damien Teney, Peng Wang, Jiewei Cao, Lingqiao Liu, Chunhua Shen, Anton, van den Hengel

arXiv: 1907.12271 · 2019-07-30

## TL;DR

V-PROM is a new benchmark designed to evaluate the ability of visual reasoning models to generalize over complex visual relationships, highlighting current models' limitations in abstract reasoning tasks.

## Contribution

The paper introduces V-PROM, a large-scale visual reasoning benchmark based on visual matrices, to assess generalization and reasoning capabilities of deep learning models.

## Key findings

- Existing models struggle with abstract reasoning tasks.
- Relational networks perform better but still have significant limitations.
- The benchmark reveals gaps in current deep learning approaches for visual reasoning.

## Abstract

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is a critical concern because generalisation enables robust reasoning over unseen data, whereas leveraging superficial statistics is fragile to even small changes in data distribution. To illuminate the issue and drive progress towards a solution, we propose a test that explicitly evaluates abstract reasoning over visual data. We introduce a large-scale benchmark of visual questions that involve operations fundamental to many high-level vision tasks, such as comparisons of counts and logical operations on complex visual properties. The benchmark directly measures a method's ability to infer high-level relationships and to generalise them over image-based concepts. It includes multiple training/test splits that require controlled levels of generalization. We evaluate a range of deep learning architectures, and find that existing models, including those popular for vision-and-language tasks, are unable to solve seemingly-simple instances. Models using relational networks fare better but leave substantial room for improvement.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.12271/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1907.12271/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1907.12271/full.md

---
Source: https://tomesphere.com/paper/1907.12271