Summarizing a virtual robot's past actions in natural language
Chad DeChant, Daniel Bauer

TL;DR
This paper introduces the task of generating natural language summaries of a virtual robot's actions, demonstrating methods to train models using existing datasets and providing baseline results for future research.
Contribution
It defines the new task of robot action summarization, adapts existing datasets for training, and evaluates multiple methods for generating action descriptions.
Findings
Models can generate meaningful summaries from egocentric video or intermediate representations.
Repurposed datasets effectively support training for robot action summarization.
Baseline quantitative and qualitative results are established for future comparison.
Abstract
We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner. We provide quantitative and qualitative evaluations of our results, which can serve as a baseline for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
