Communicative Learning with Natural Gestures for Embodied Navigation   Agents with Human-in-the-Scene

Qi Wu; Cheng-Ju Wu; Yixin Zhu; Jungseock Joo

arXiv:2108.02846·cs.AI·August 9, 2021

Communicative Learning with Natural Gestures for Embodied Navigation Agents with Human-in-the-Scene

Qi Wu, Cheng-Ju Wu, Yixin Zhu, Jungseock Joo

PDF

Open Access 1 Repo

TL;DR

This paper explores using natural human gestures as a communication interface to improve embodied agent navigation, introducing a VR simulation environment and demonstrating that gesture-based cues enhance navigation performance.

Contribution

The study develops Ges-THOR, a VR environment for gesture-based communication, and shows that natural gestures without predefined semantics can significantly improve navigation tasks.

Findings

01

Gesture cues improve navigation performance

02

Natural gestures outperform verbal instructions in experiments

03

Mutual learning of gestures and navigation enhances agent capabilities

Abstract

Human-robot collaboration is an essential research topic in artificial intelligence (AI), enabling researchers to devise cognitive AI systems and affords an intuitive means for users to interact with the robot. Of note, communication plays a central role. To date, prior studies in embodied agent navigation have only demonstrated that human languages facilitate communication by instructions in natural languages. Nevertheless, a plethora of other forms of communication is left unexplored. In fact, human communication originated in gestures and oftentimes is delivered through multimodal cues, e.g. "go there" with a pointing gesture. To bridge the gap and fill in the missing dimension of communication in embodied agent navigation, we propose investigating the effects of using gestures as the communicative interface instead of verbal cues. Specifically, we develop a VR-based 3D simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qiwu57kevin/iros2021-gesthor
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Speech and dialogue systems