I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang, Brian Liang, Varad Dhat, Zander Brumbaugh, Nick Walker,, Ranjay Krishna, Maya Cakmak

TL;DR
This paper presents RONAR, an LLM-based system that translates multi-modal robot experiences into natural language, improving transparency, failure analysis, and human-robot interaction.
Contribution
Introduces RONAR, a novel multi-modal framework for natural language narration of robot experiences, along with a new real-robot dataset and empirical validation.
Findings
RONAR outperforms existing methods in various scenarios.
Enhances failure recovery efficiency.
Improves user experience in system transparency.
Abstract
Understanding robot behaviors and experiences through natural language is crucial for developing intelligent and transparent robotic systems. Recent advancement in large language models (LLMs) makes it possible to translate complex, multi-modal robotic experiences into coherent, human-readable narratives. However, grounding real-world robot experiences into natural language is challenging due to many reasons, such as multi-modal nature of data, differing sample rates, and data volume. We introduce RONAR, an LLM-based system that generates natural language narrations from robot experiences, aiding in behavior announcement, failure analysis, and human interaction to recover failure. Evaluated across various scenarios, RONAR outperforms state-of-the-art methods and improves failure recovery efficiency. Our contributions include a multi-modal framework for robot experience narration, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications
