Bridging Visual Perception with Contextual Semantics for Understanding   Robot Manipulation Tasks

Chen Jiang; Martin Jagersand

arXiv:1909.07459·cs.CV·July 28, 2020

Bridging Visual Perception with Contextual Semantics for Understanding Robot Manipulation Tasks

Chen Jiang, Martin Jagersand

PDF

Open Access

TL;DR

This paper introduces a framework that combines vision-language models and ontologies to generate dynamic knowledge graphs from videos, enabling robots to understand and perform manipulation tasks in contextually rich environments.

Contribution

It presents a novel method for integrating visual perception with semantic knowledge graphs for robot manipulation understanding.

Findings

01

Successfully generated high-level knowledge graphs from videos

02

Enabled robots to interpret manipulation scenarios in a kitchen environment

03

Bridged visual perception with contextual semantics effectively

Abstract

Understanding manipulation scenarios allows intelligent robots to plan for appropriate actions to complete a manipulation task successfully. It is essential for intelligent robots to semantically interpret manipulation knowledge by describing entities, relations and attributes in a structural manner. In this paper, we propose an implementing framework to generate high-level conceptual dynamic knowledge graphs from video clips. A combination of a Vision-Language model and an ontology system, in correspondence with visual perception and contextual semantics, is used to represent robot manipulation knowledge with Entity-Relation-Entity (E-R-E) and Entity-Attribute-Value (E-A-V) tuples. The proposed method is flexible and well-versed. Using the framework, we present a case study where robot performs manipulation actions in a kitchen environment, bridging visual perception with contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling