Towards Embodied Scene Description
Sinan Tan, Huaping Liu, Di Guo, Xinyu Zhang, Fuchun Sun

TL;DR
This paper introduces Embodied Scene Description, enabling agents to actively explore environments and generate scene descriptions by learning sensorimotor activities through imitation and reinforcement learning.
Contribution
It presents a novel framework that combines imitation and reinforcement learning for embodied agents to perform scene description tasks.
Findings
Effective in AI2Thor dataset
Successful real-world robotic implementation
Demonstrates extendability of the approach
Abstract
Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from the interaction between the agent and the environment. In this work, we propose the Embodied Scene Description, which exploits the embodiment ability of the agent to find an optimal viewpoint in its environment for scene description tasks. A learning framework with the paradigms of imitation learning and reinforcement learning is established to teach the intelligent agent to generate corresponding sensorimotor activities. The proposed framework is tested on both the AI2Thor dataset and a real world robotic platform demonstrating the effectiveness and extendability of the developed method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
