Towards Embodied Scene Description

Sinan Tan; Huaping Liu; Di Guo; Xinyu Zhang; Fuchun Sun

arXiv:2004.14638·cs.RO·May 8, 2020·1 cites

Towards Embodied Scene Description

Sinan Tan, Huaping Liu, Di Guo, Xinyu Zhang, Fuchun Sun

PDF

Open Access

TL;DR

This paper introduces Embodied Scene Description, enabling agents to actively explore environments and generate scene descriptions by learning sensorimotor activities through imitation and reinforcement learning.

Contribution

It presents a novel framework that combines imitation and reinforcement learning for embodied agents to perform scene description tasks.

Findings

01

Effective in AI2Thor dataset

02

Successful real-world robotic implementation

03

Demonstrates extendability of the approach

Abstract

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real world robotic platform demonstrating the effectiveness and extendability of the developed method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning