Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot   Interaction

Kaiqi Chen; Jing Yu Lim; Kingsley Kuan; Harold Soh

arXiv:2308.06498·cs.AI·August 15, 2023

Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction

Kaiqi Chen, Jing Yu Lim, Kingsley Kuan, Harold Soh

PDF

Open Access

TL;DR

This paper introduces LEAPT, a deep probabilistic model enabling robots to perform perspective-taking by inferring what humans see and believe, improving understanding in human-robot interactions under uncertainty.

Contribution

The work presents a novel latent state space model that generates and augments fictitious observations, allowing robots to better understand human perspectives in partially observable settings.

Findings

01

Outperforms existing baselines in predicting human observations and beliefs

02

Successfully infers visual observations and internal beliefs of humans

03

Demonstrates robustness in three partially-observable HRI tasks

Abstract

Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This work proposes to address this limitation via a deep world model that enables a robot to perform both perception and conceptual perspective taking, i.e., the robot is able to infer what a human sees and believes. The key innovation is a decomposed multi-modal latent state space model able to generate and augment fictitious observations/emissions. Optimizing the ELBO that arises from this probabilistic graphical model enables the learning of uncertainty in latent space, which facilitates uncertainty…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques