ReStory: VLM-augmentation of Social Human-Robot Interaction Datasets
Fanjun Bu, Wendy Ju

TL;DR
ReStory is a novel method that uses vision language models to augment scarce in-the-wild human-robot interaction datasets by synthesizing human-interpretable interaction scenarios, aiding researchers and designers.
Contribution
ReStory introduces a semi-automated approach to generate interpretable interaction scenarios from limited data, enhancing dataset utility for HRI research.
Findings
Enables synthesis of human-interpretable interaction storyboards
Augments existing datasets with minimal human supervision
Facilitates better analysis and design of HRI systems
Abstract
Internet-scaled datasets are a luxury for human-robot interaction (HRI) researchers, as collecting natural interaction data in the wild is time-consuming and logistically challenging. The problem is exacerbated by robots' different form factors and interaction modalities. Inspired by recent work on ethnomethodological and conversation analysis (EMCA) in the domain of HRI, we propose ReStory, a method that has the potential to augment existing in-the-wild human-robot interaction datasets leveraging Vision Language Models. While still requiring human supervision, ReStory is capable of synthesizing human-interpretable interaction scenarios in the form of storyboards. We hope our proposed approach provides HRI researchers and interaction designers with a new angle to utilizing their valuable and scarce data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
