Embodied Concept Learner: Self-supervised Learning of Concepts and   Mapping through Instruction Following

Mingyu Ding; Yan Xu; Zhenfang Chen; David Daniel Cox; Ping Luo; Joshua; B. Tenenbaum; Chuang Gan

arXiv:2304.03767·cs.CV·April 10, 2023·6 cites

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua, B. Tenenbaum, Chuang Gan

PDF

Open Access

TL;DR

The Embodied Concept Learner (ECL) enables robots to learn visual concepts and depth perception through interaction and language instructions without supervision, improving instruction following and task reasoning in 3D environments.

Contribution

ECL introduces a modular, self-supervised framework for grounding concepts, constructing semantic maps, and executing tasks from language, with enhanced interpretability and reusability.

Findings

01

Outperforms previous methods on the ALFRED benchmark.

02

Learns semantics and depth unsupervisedly through interaction.

03

Provides transparent, step-by-step planning process.

Abstract

Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts, build semantic maps and plan actions to complete tasks by learning purely from human demonstrations and language instructions, without access to ground-truth semantic and depth supervisions from simulations. ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Reinforcement Learning in Robotics