Learning Social Affordance Grammar from Videos: Transferring Human   Interactions to Human-Robot Interactions

Tianmin Shu; Xiaofeng Gao; Michael S. Ryoo; Song-Chun Zhu

arXiv:1703.00503·cs.RO·March 3, 2017·6 cites

Learning Social Affordance Grammar from Videos: Transferring Human Interactions to Human-Robot Interactions

Tianmin Shu, Xiaofeng Gao, Michael S. Ryoo, Song-Chun Zhu

PDF

Open Access

TL;DR

This paper introduces a framework for learning social affordance grammar from RGB-D videos, enabling robots to infer and perform human-like interactions in real-time, with demonstrated effectiveness in simulation and real-world tests.

Contribution

It presents a novel weakly supervised method to learn hierarchical social affordance grammar as an ST-AOG from RGB-D videos for human-robot interaction.

Findings

01

Successfully generates human-like behaviors in unseen scenarios

02

Outperforms baseline methods in experiments

03

Enables real-time motion inference for humanoid robots

Abstract

In this paper, we present a general framework for learning social affordance grammar as a spatiotemporal AND-OR graph (ST-AOG) from RGB-D videos of human interactions, and transfer the grammar to humanoids to enable a real-time motion inference for human-robot interaction (HRI). Based on Gibbs sampling, our weakly supervised grammar learning can automatically construct a hierarchical representation of an interaction with long-term joint sub-tasks of both agents and short term atomic actions of individual agents. Based on a new RGB-D video dataset with rich instances of human interactions, our experiments of Baxter simulation, human evaluation, and real Baxter test demonstrate that the model learned from limited training data successfully generates human-like behaviors in unseen scenarios and outperforms both baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications