Gesture-Informed Robot Assistance via Foundation Models

Li-Heng Lin; Yuchen Cui; Yilun Hao; Fei Xia; Dorsa Sadigh

arXiv:2309.02721·cs.RO·September 8, 2023·2 cites

Gesture-Informed Robot Assistance via Foundation Models

Li-Heng Lin, Yuchen Cui, Yilun Hao, Fei Xia, Dorsa Sadigh

PDF

Open Access

TL;DR

GIRAF leverages large language models to interpret human gestures and language instructions, significantly improving robot understanding and collaboration in tabletop tasks through flexible, context-aware reasoning.

Contribution

The paper introduces GIRAF, a novel framework that uses large language models for flexible gesture and instruction interpretation in human-robot interaction.

Findings

01

70% higher success rate than baseline in gesture interpretation

02

81% success rate on diverse gesture-based task planning

03

Effective and user-preferred in human-robot collaboration

Abstract

Gestures serve as a fundamental and significant mode of non-verbal communication among humans. Deictic gestures (such as pointing towards an object), in particular, offer valuable means of efficiently expressing intent in situations where language is inaccessible, restricted, or highly specialized. As a result, it is essential for robots to comprehend gestures in order to infer human intentions and establish more effective coordination with them. Prior work often rely on a rigid hand-coded library of gestures along with their meanings. However, interpretation of gestures is often context-dependent, requiring more flexibility and common-sense reasoning. In this work, we propose a framework, GIRAF, for more flexibly interpreting gesture and language instructions by leveraging the power of large language models. Our framework is able to accurately infer human intent and contextualize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Hand Gesture Recognition Systems · Natural Language Processing Techniques