# Does computer vision matter for action?

**Authors:** Brady Zhou, Philipp Kr\"ahenb\"uhl, and Vladlen Koltun

arXiv: 1905.12887 · 2019-10-23

## TL;DR

This study investigates whether computer vision representations are essential for action tasks by comparing models with and without intermediate visual features in immersive simulations, finding that vision aids training and generalization.

## Contribution

The paper provides empirical evidence that intermediate computer vision representations improve training speed, task performance, and generalization in action-oriented tasks.

## Key findings

- Models with visual representations train faster.
- Visual models achieve higher task performance.
- Visual models generalize better to new environments.

## Abstract

Computer vision produces representations of scene content. Much computer vision research is predicated on the assumption that these intermediate representations are useful for action. Recent work at the intersection of machine learning and robotics calls this assumption into question by training sensorimotor systems directly for the task at hand, from pixels to actions, with no explicit intermediate representations. Thus the central question of our work: Does computer vision matter for action? We probe this question and its offshoots via immersive simulation, which allows us to conduct controlled reproducible experiments at scale. We instrument immersive three-dimensional environments to simulate challenges such as urban driving, off-road trail traversal, and battle. Our main finding is that computer vision does matter. Models equipped with intermediate representations train faster, achieve higher task performance, and generalize better to previously unseen environments. A video that summarizes the work and illustrates the results can be found at https://youtu.be/4MfWa2yZ0Jc

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12887/full.md

## Figures

47 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12887/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1905.12887/full.md

---
Source: https://tomesphere.com/paper/1905.12887