# Visual Understanding and Narration: A Deeper Understanding and   Explanation of Visual Scenes

**Authors:** Stephanie M. Lukin, Claire Bonial, and Clare R. Voss

arXiv: 1906.00038 · 2019-09-25

## TL;DR

This paper introduces the task of Visual Understanding and Narration, where an agent generates descriptive text for images captured during navigation, aiming to enhance interpretability of visual scenes.

## Contribution

It formalizes the task of visual narration for robots, proposing methods for generating open-ended descriptive text based on visual data.

## Key findings

- Proposed a framework for visual narration in robotic navigation
- Demonstrated the system's ability to answer open-ended questions about scenes
- Improved understanding of scene context through narration

## Abstract

We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as 'what happens, or might have happened, here?'

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.00038/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1906.00038/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1906.00038/full.md

---
Source: https://tomesphere.com/paper/1906.00038