On the Strengths and Weaknesses of Data for Open-set Embodied Assistance
Pradyumna Tambwekar, Andrew Silva, Deepak Gopinath, Jonathan DeCastro, Xiongyi Cui, Guy Rosman

TL;DR
This paper investigates the generalization capabilities of multimodal foundation models in assistive tasks, focusing on open-set scenarios where models must handle unseen user behaviors and new configurations, using synthetic datasets in a simulated environment.
Contribution
It introduces the concept of open-set corrective assistance, explores its challenges, and demonstrates how diverse, multimodal assistive datasets improve model generalization in synthetic domains.
Findings
Models benefit from datasets covering diverse assistance aspects.
Synthetic datasets enable evaluation of open-set assistive capabilities.
Multimodal grounding and defect inference are crucial for performance.
Abstract
Embodied foundation models are increasingly performant in real-world domains such as robotics or autonomous driving. These models are often deployed in interactive or assistive settings, where it is important that these assistive models generalize to new users and new tasks. Diverse interactive data generation offers a promising avenue for providing data-efficient generalization capabilities for interactive embodied foundation models. In this paper, we investigate the generalization capabilities of a multimodal foundation model fine-tuned on diverse interactive assistance data in a synthetic domain. We explore generalization along two axes: a) assistance with unseen categories of user behavior and b) providing guidance in new configurations not encountered during training. We study a broad capability called \textbf{Open-Set Corrective Assistance}, in which the model needs to inspect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
